(navigation image)
Home American Libraries | Canadian Libraries | Universal Library | Community Texts | Project Gutenberg | Children's Library | Biodiversity Heritage Library | Additional Collections
Search: Advanced Search
Anonymous User (login or join us)
Upload
See other formats

Full text of "Local assessment of perturbations"

LOCAL ASSESSMENT OF PERTURBATIONS 



*l*.'*»t* 


By 


GLEN LAWSON HARTLESS 




- . 



A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL 

OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT 

OF THE REQUIREMENTS FOR THE DEGREE OF 

DOCTOR OF PHILOSOPHY 

UNIVERSITY OF FLORIDA 

2000 



..'V 



^ *".• . ■ . 



Copyright 2000 

by 
Glen Lawson Hartless 



To Christy 






A 



ACKNOWLEDGMENTS 

I would first like to thank my chairman, Jim Booth, and cochairman, Ramon 
Littell, for all of their help and guidance. Their combined knowledge, insight, and 
mentoring has been extraordinary. I would also like to express my gratitude to them 
for allowing me to pursue avenues of research that are probably closer to my heart 
than theirs. Thanks also go to the other esteemed professors who served on my 
supervisory committee: James Algina, Randy Carter, and James Hobert. 

I would also like to acknowledge people who have influenced me academically 
and professionally over the years: Chuck Nowalk, John Grant, and many others 
in the Phillipsburg and Lopatcong school systems, Walter Pirie and the faculty 
at Virginia Tech, Ron DiCola at AT&T/Lucent, Larry Leemis at the College of 
William & Mary, Danny Perkins, Bo Beaulieu, and many others at the University of 
Florida, and also fellow students Jonathan Hartzel, Phil McGoff, and Galin Jones, 
among many others. I would also like to thank Allan West for maintaining an 
extremely reliable computing environment. 

Thanks also go to my parents for their love and support and for instilling in 
me the discipline needed to achieve my goals. 

Finally, my utmost thanks go to my wife, Christine. I am forever grateful for 
her unequivocal love, unending patience, and thoughtful insight. 



IV 



TABLE OF CONTENTS 

page 

ACKNOWLEDGMENTS iv 

ABSTRACT vii 

CHAPTERS , , , ^ %■ 

1 INTRODUCTION TO LOCAL INFLUENCE 1 

1.1 Premise 1 

1.2 Measuring the Effects of Perturbations 1 

1.3 Local Influence Diagnostics 3 

1.4 Perturbations in Regression 8 

1.5 Basic Perturbations 18 

1.6 Brief Literature Review 18 

1.7 Outline 20 

2 COMPUTATIONAL ISSUES IN LOCAL INFLUENCE ANALYSIS 21 

2.1 Curvatures and the Direction of Maximum Curvature .... 21 

2.2 Local Influence Analysis of a Profile Likelihood Displacement 23 

2.3 Building Blocks of Local Influence for Regression 24 

2.4 Perturbations to the Response in Regression 27 

2.5 Unequal Measurement Precision Applied to Regression ... 31 

2.6 Benchmarks for Perturbation Analyses 36 

2.7 Local Assessment Under Reparameterizations 40 

3 PERTURBATIONS TO THE X MATRIX IN REGRESSION ... 49 

3.1 Perturbations to Explanatory Variables 49 

3.2 Perturbations to a Single Column 50 

3.3 Perturbations to a Single Row 59 

3.4 Perturbations to the Entire Design Matrix 65 

4 LOCAL ASSESSMENT FOR OPTIMAL ESTIMATING FUNCTIONS 87 

4.1 Inferential Platform 88 

4.2 Local Assessment of Perturbations 93 

4.3 Acceleration, Curvature and Normal Curvature 96 

4.4 Algebraic Expressions for Local Assessment 98 

4.5 Computational Formulas for J and P 103 



4.6 Local Assessment for Maximum Likelihood Inference 110 

4.7 First-order Assessment of the Log Generalized Variance ... 116 

4.8 Local Assessment for Quasi-likelihood Inference 125 

5 LOCAL ASSESSMENT OF PREDICTIONS 134 

5.1 Prediction via First Principles 134 

5.2 Predictive Likelihood 136 

5.3 Influence Measures for Prediction 138 

5.4 Estimation and Prediction in Linear Mixed Models 140 

5.5 Local Influence Methods for Linear Mixed Models 146 

5.6 One-way Random Eflfects Model 151 

5.7 Residual Variance Perturbation Scheme 154 

5.8 Cluster Variance Perturbation Schemes 159 

5.9 Application to Corn Crop Data 167 

6 MULTI-STAGE ESTIMATIONS AND FUTURE RESEARCH ... 185 

6.1 Local Assessment in Multi-stage Estimations 185 

6.2 Synopsis and Additional Future Research 186 

APPENDICES 

A MODEL FITTING INFORMATION FOR SCHOOL DATA .... 188 

B MODEL FITTING INFORMATION FOR CORN CROP DATA . . 191 

REFERENCES 194 

BIOGRAPHICAL SKETCH 202 



VI 






fty? 



Abstract of Dissertation Presented to the Graduate School 

of the University of Florida in Partial Fulfillment of the 

Requirements for the Degree of Doctor of Philosophy 



LOCAL ASSESSMENT OF PERTURBATIONS 



By 

Glen Lawson Hartless 
August 2000 



Chairman: James G. Booth 
Major Department: Statistics 



Statistical models are useful and convenient tools for describing data. 
However, models can be adversely effected by mistaken information, poor 
assumptions, and small subsets of influential data. Seemingly inconsequential 
changes to the observed data, the postulated model, or the inference technique may 
greatly affect the scientific conclusions made from model-fitting. This dissertation 
considers local assessment of perturbations in parametric and semi-parametric 
models. The effects of perturbation are assessed using a Taylor-series approximation 
in the vicinity of the fitted model, leading to graphical diagnostics that allow the 
analyst to identify sensitive parameter estimates and sources of undue influence. 

First, computational formulas are given to assess the influence of perturbations 
on linear combinations of the parameter estimates for maximum likelihood. A 
diagnostic plot that identifies sensitive fitted values and self-predicting observations 
in linear regression is introduced. Second, the eigensystem of the acceleration matrix 
for perturbing each element of the design matrix in linear regression is derived. 

vii 



This eigensystem leads to a diagnostic plot for identifying specific elements that 
are influential, and it is shown how the results relate to principal components, 
collinearity, and influence diagnostics. 

Next, it is shown that the conceptual framework and computational tools 
easily extend from maximum likelihood to inference that is performed using 
estimating equations. In particular, local assessment of parameter estimates and 
standard errors derived from optimal estimating functions is addressed. Algebraic 
expressions and computational formulas are given for both first- and second-order 
assessment, and applications such as quasi-likelihood are considered. 

Finally, the techniques are applied to linear mixed models in order to assess 
the influence of individual observations and clusters. While previous applications 
considered only estimation, the effects of perturbations on both estimation and 
prediction is addressed. It is shown that the effects on the fixed and random effect 
inferences can be assessed simultaneously by using an influence measure based on 
the unstandardized predictive log-likelihood of the random effects. 






k. 



CHAPTER 1 
INTRODUCTION TO LOCAL INFLUENCE ; .' 

1.1 Premise 

George Box (1982, p. 34) stated "all models are wrong; some models are 

useful," and A. F. Desmond (1997b, p. 117) stated "a robust method ought not to 

perform poorly in neighborhoods of the nominally true model." Indeed, conclusions 

based on a statistical model should be reliable under small perturbations to the 

data, the model's assumptions or the particulars of the estimation method. In a 

seminal paper. Cook (1986) utilized the basic concept of perturbing the postulated 

model or the observed data to develop a variety of diagnostic tools. He showed how 

small perturbations can be used to develop easily-computable diagnostics for almost 

any model estimated by maximum likelihood (ML) estimation. 

1.2 Measuring the Effects of Perturbations 
In a statistical analysis, parameter estimates are a crucial part of the 
decision-making process. These estimates are a function of the observed data, the 
postulated model, and the analyst's choice of statistical inference technique. Any 
one of these three aspects of a statistical-modeling situation is subject to uncertainty 
and doubt: observed data may be incorrectly recorded or subject to rounding 
error, a slightly different model may be just as plausible as the postulated model, 
and nuisance aspects of modeling such as choosing hyperparameters are subject to 
second-guessing. If small changes in these aspects of a statistical modeling problem 
create large changes in parameter estimates, the analysis is unreliable. Therefore, 
the change in parameter estimates due to perturbation can serve as a barometer for 
the robustness of the scientific conclusions. 



Following Cook (1986), a vector of q perturbations is denoted as 
u = {u}i,U2, . ■ ■ ,ujq)', and the g-dimensional space of possible perturbations is 
denoted as ft. Included in this space is one point, wo, that does not change 
the original formulation of the problem, and this is called the null perturbation. 
The perturbations are not parameters to be estimated, but rather they are 
known real-valued quantities introduced into the problem by the analyst. These 
perturbations might represent measurement errors, unequal variances, case weights, 
correlated errors, missing predictors or some other change to the data, model or 
inference method. Prudent choice of perturbations can lead to useful diagnostics. 
For example, if parameter estimates are sensitive to perturbing the contribution 
(weight) of a particular observation to estimation, it indicates that the observation 
is overly influential on the estimates. 

For data y and parameters 6, let L{0] y) denote the log-likelihood from 
the original postulated model, and let L^{9\y) denote the log-likelihood resulting 
from the inclusion of perturbations. If the perturbation specified is the null 
perturbation, then the two likelihoods are equivalent. The resulting maximum 
likelihood estimators (MLEs) of these two likelihoods are denoted as and 0^^, 
respectively. Cook (1986) used the likelihood displacement to summarize parameter 
estimate changes under perturbation: 

LD(u;) = 2[L(0;y)-L(0,;j/)]. (1.1) 

From inspection of (1.1), we see that the likelihood displacement is the difference of 
the original likelihood evaluated at two values: the original MLEs and the perturbed 
MLEs. Because maximizes the original likelihood, LD{u}) is non-negative and 
achieves its minimum value of zero when u; = Wq- As the perturbed MLEs become 
increasingly different from the original MLEs, LD{u)) becomes larger. Thus, the 



size of LD{u}) provides a measure of how much the perturbation affects parameter 
estimation. 

Effects on a subset of parameters, Oi from 0' = {e[, O'^), can be obtained by 
using the profile likelihood displacement: 

LD^''^{Lj) = 2[L{e,,02;y)-L{0i^,g{ei^y,y)], (1-2) 

where Oi^ is from 9^ = {0[^, 9^^, and g{9i^) is the vector that maximizes the profile 
log-likelihood, L{9i^,92), with respect to 02- The profile likelihood displacement 
allows influence to be "directed" through 9i, with the other parameter estimates 
only changing in sympathy. 

In the next section it is shown how to obtain diagnostic tools from a 
local influence analysis of the likelihood displacement and the profile likelihood 
displacement. "*-:..'%>• _v. -^ 

«- '■ ■• t- '' • : . ■■*■ 

'^.^ Vr, V>' 1.3 Local Influence Diagnostics 
.- Although we could simply choose values for a? and compute LD{lj), this 
is not particularly useful for two reasons. First, LD{u:) is often not a closed-form 
function of w. Hence, if we wish to find its value, re-fitting the model may be 
necessary. Second, even if calculating LD{u}) was easy, we must somehow decide 
what values to use for w. This choice would be highly data-dependent and a 
nuisance for the analyst. 

Therefore, we would like a methodology that can examine the information 
contained in the function LD{u}) and yet does not require a choice of actual values 
for a> nor involve re-estimating the model parameters. Following Cook (1986), 
this can be accomplished using the notion of curvature. Simple plots can then be 
used to identify whether estimation is sensitive to any particular elements of the 
g-dimensional perturbation. 



■B^?;. ■ •' i,r-^..i?!■ 



*- /• 




Figure 1.1: Schematic of a plane curve with its tangent line. The angle between the 
tangent line and the x-axis is (f). 



1.3.1 Curvature of a Plane Curve 

Curvature is a measure of how quickly a curve turns or twists at a particular 
point (Swokowski 1988). Let {f{t),g{t)) be a smooth plane curve. The arc length 
parameter s of such a curve is the distance travelled along the curve from a fixed 
point A to another point on the curve (Figure 1.1). Letting Ia denote the value of t 
that yields point A, s can be computed as 



{t)= [\r{u)f + {g'{u)f)Uu, 

JtA 



(1.3) 



and we may express the curve in terms of s: {f{s~^{s)),g{s~^{s))). Additionally, let 
T{s) be the tangent line at a point on the curve, and let (j) denote the angle between 
T{s) and the x-axis. 

Curvature is defined as the absolute value of the rate of change in with 
respect to s: 

50 



C = 



ds 



(1.4) 



A plane curve that is turning quickly at {f{t),g{t)) has a larger curvature than a 
curve that is nearly straight at that point. 



! Ct 




Figure 1.2: Curve with two circles of curvature 



The reciprocal of curvature is called the radius of curvature, and it is the 
radius of the best fitting circle at that point on the curve. These circles of curvature 
have the same first and second derivatives and tangent line as the curve and lie on 
its concave side. Figure 1.2 shows the curve (|sin(f), \sm{2t)) and two of its circles 
of curvature. The one on the left occurs when t = ^ while the other one occurs 
when t = arccos(O). Their curvatures are 6 and 8/9 respectively. The expression 
"turning on a dime" refers to to a sharp turn: one in which the radius of curvature 
is small, and consequently the curvature of the path is large. 

The formula for curvature is 



\f'{t)g"{t)-g'{t)f"{t) 



C 



We are particularly interested in assessing curvature at the local minimum of a 
function from x to y, i.e. when x = t. For this special case, curvature measures 
how quickly the function increases from that minimum value. In this case, (1.5) 
simplifies to the acceleration of the curve: 

d'g{x) 



(1.5) 



C = 



dx^ 



(1.6) 



X^Xq 




Figure 1.3: Likelihood displacement with directional vectors 

where g{xo) is the function's minimum. This result can be applied to LD{u}) at u;o 
when there is only a single perturbation, lo. Here, the curvature is given by 

:■- . \. d^LD{uj) 



doj^ 



(1.7) 



Ul=(jJo 



and it measures how quickly the parameter estimates change when u is moved 
away from the null perturbation. A large curvature indicates that even if uj is close 
to, or local to ljq, the likelihood displacement increases substantially. Hence, a 
large curvature indicates that even a slight perturbation can be very influential on 
parameter estimates. 

1.3.2 Using Plane Curves to Describe LDju:) 

The curvature of plane curves can also be used to describe LD{u}) when 
q > 1. Now u> can be moved away from wq in different directions, as shown by 
the vectors in Figure 1.3 for a two-dimensional perturbation. These vectors can be 
written as 



U3v{a) = Wo + av, 



(1.8) 




Figure 1.4: Intersection of plane and surface creates space curve 

where the vector v determines the direction and the value a determines the position 
relative to wq- Let Py denote a plane that contains u}y{a) and extends directly 
upward from ft to intersect LD{(jj) (Figure 1.4). The intersection of Pt; and LD{u}) 
forms a space curve, (a;„(a)',LZ)(u;„(a)))^ . Because this space curve is contained 
within the plane Py, its curvature can be obtained as if it were a plane curve, 
(a, LD{u}y{a)). Thus the curvature of LD{u:y{a)) at Wo is simply 

d^LD{u}y{a)) 



L/y — 



dd^ 



(1.9) 



a=0 



Different directions produce curvatures of various sizes. The normalized vector that 
produces the largest curvature is denoted Umax and is of particular interest. 



1.3.3 Direction of Maximum Curvature, v 



1 '^max 



The relative sizes of the elements of Vmax indicate which elements of ui are 
most influential on the parameter estimates. For example, if the i*'* element of Uma: 



^ More formally, these space curves are called normal sections. The term comes 
from the fact that each plane, Py, contains the surface's principal normal vector, the 
vector that is orthogonal to the surface's tangent plane at ijJq. Curvatures of the 
normal sections are referred to as normal curvatures. 



8 

is large, then Wj is a key player in producing the largest curvature of LD(a;) at the 
null perturbation, i.e. in producing an immediate change in parameter estimates. 

The direction of maximum curvature is the main diagnostic of Cook's local 
influence methodology. Typically, Vmax is standardized to have length one and is 
then plotted against indices l,2,...,q. If any elements are particularly large relative 
to the others, they stand out in the plot. The analyst then may want to take steps 
to alleviate the sensitivity to the perturbation. What, if any, steps to be taken 
depend upon the perturbation scheme and the particulars of the data analysis. 

For some models and perturbation schemes, the formula for Vmax is known 
explicitly, while in other situations it requires some moderate computations. In 
analyses such as regression, Vmax is typically a function of familiar quantities. 

1.4 Perturbations in Regression 
Consider the normal linear regression model: 

Y = Xl3 + e, (1.10) 

where V is a n x 1 vector of responses, X is a known n x k model matrix of rank k, 
P is a k X 1 vector of unknown parameters, and e is a n x 1 vector of independent 
identically distributed normal random variables with mean and variance cr^ > 0. 
Hence, Y ~ N{X^, a2j„), and = (/?i, ...,l3k, a^)'. 

1.4.1 Scheme 1: Measurement Error in the Response 

In order to examine the effects of measurement error in the response, 
Schwarzmann (1991) introduced perturbations to the observed data. He considered 
fitting a model to the observed data set with slight perturbations: tji + a;,-, for 
i — 1,. . . ,q. Hence, ojo is a vector of zeroes for this perturbation scheme. By letting 
q — n, all of the observations can be perturbed simultaneously. This allows the 



V^^^^- T ,F%v" t 



I "7 >«.»j 



■ J* 



Table 1.1: Scottish hills race data 



Observation 


Record Time (min.) 


Distance (miles) 


Climb (ft) 


1 


16.083 


2.5 


650 


2 


48.350 


6.0 


2500 


3 


33.650 


6.0 


900 


4 


45.600 


7.5 


800 


5 


62.267 


8.0 


3070 


6 


73.217 


8.0 


2866 


7 


204.617 


16.0 


7500 


8 


36.367 


6.0 


800 


9 


29.750 


5.0 


800 


10 


39.750 


6.0 


650 


11 


192.667 


28.0 


2100 


12 


43.050 


5.0 


2000 


13 


65.000 


9.5 


2200 


14 


44.133 


6.0 


500 


15 


26.933 


4.5 


1500 


16 


72.250 


10.0 


3000 


17 


98.417 


14.0 


2200 


18 


78.650 


3.0 


350 


19 


17.417 


4.5 


1000 


20 


32.567 


5.5 


600 


21 


15.950 


3.0 


300 


22 


27.900 


3.5 


1500 


23 


47.633 


6.0 


2200 


24 


17.933 


2.0 


900 


25 


18.683 


3.0 


600 


26 


26.217 


4.0 


2000 


27 


34.433 


6.0 


800 


28 


28.567 


5.0 


950 


29 


50.500 


6.5 


1750 


30 


20.950 


5.0 


500 


31 


85.583 


10.0 


4400 


32 


32.383 


6.0 


600 


33 


170.250 


18.0 


5200 


34 


28.100 


4.5 


850 


35 


159.833 


20.0 


5000 



Source: Atkinson (1986) 



10 

analyst to determine which observations would have the most effect on parameter 
estimates in the event that there is measurement or rounding error. 
1.4.1.1 Influence on the full vector of parameter estimates 

Atkinson (1986) analyzed a data set of record times for 35 diff'erent hill races, 
as recorded by the Scottish Hill Runners Association (Table 1.1). Scatter plots 
indicate the positive association of the three variables: winning time, race distance 
and race climb. (Figure 1.5). The influence of perturbations on the parameter 
estimates from a regression of winning time on DISTANCE and CLIMB is of 
interest. For convenience, the regressors and response have been centered and scaled, 
thereby eliminating the need for an intercept. Thus, the full vector of p parameters 

is 9 = (/^distj^cUmbjO" )'• 

Figure 1.6 shows Vmax plotted against the indices 1 through g = n = 35 for 
the Scottish hills data. Element 18, which corresponds to perturbation of the 18*'* 
observation, is the dominant contributer to the direction of maximum curvature. 
This observation is from a short race with little CLIMB, and yet it has a large 
record time. Similar races have much smaller record times, indicating that race 18 
is an outlier. Atkinson (1986) has actually suggested that the winning time for this 
race may have been 18.67 minutes rather than 1 hour and 18.67 minutes. 

Interestingly, it can be shown that the direction of maximum curvature for 
Scheme 1 is proportional to the residuals. Thus, if we standardized the vector 
of residuals to length 1 and plotted them against case indices, we would obtain 
Figure 1.6. This implies that parameter estimates are most sensitive to a set of 
small additive measurement errors when those errors tend to exaggerate (or correct) 
the model's error in estimating each data point, with the most exaggeration (or 
correction) occurring for the most poorly predicted points. 



%' > ■''" - ''' :--'^-' y 



i * 



11 



Time 
200 

150 



100 



•11 



•18 



:•• 



/ 



5 10 15 20 25 30 



Distance 



Time 

200 



150 



100 



•11 



• 18 



J.: 



• • 
•••• 



1000 2000 3000 4000 5000 6000 7000 8001 



niitb 



(a) Winning time vs. race 
distance 



(b) Winning time vs. race 
climb 



Climb 



80UU 
7000 




• 7 




6000 


_.. 






5000 
4000 


• 


•t 




3000 
2000 
1000 


1 • 


t 


.11 



5 10 15 20 25 30 



Distance 



(c) Race climb vs. race dis- 
tance 

Figure 1.5: Hills data; scatter-plots with highlighted cases 



0.75 

« 0.5 

I 

^ 0.25 







in 

§-0.25 

e 



w 



-0.5 



-0.75 



* • 



. « . * • • * ♦ ' *»\ * 



• • 



• . • 



5 10 15 20 25 30 35 
Case 



Figure 1.6: Hills data; response perturbation; Umax for 



12 



1 

0.75 
0.5 
0.25 

-0.25 

■0.5 

-0.75 

•1 



■ . t t* 



*„*-i- 



«. »• 10 . *' *^',»«», H *^'' 



1 

0.75 
0.5 
0.25 

-0.25 

-0.5 

-0.75 

-1 



• • • 



• t,5 »*if ,15 ,•%* * »• V , •35 



(a) Umax for directed influ- 
ence on $^st 



(b) Umax for directed influ- 
ence on /3climb 



Figure 1.7: Hills data; response perturbation 

1.4.1.2 Influence on a subset of parameter estimates 

Plotting Vmax for the profile likelihood displacement, LD^^^^{(jj), can indicate 
which components of a; affect a subset of parameters. Examining influence on 
§1 = (7^ results in the same Umax as the original likelihood displacement, i.e. it is 
proportional to the residuals. The directions of maximum curvatures for 6i = ^dist 
and §1 = /3cUmb are shown in Figures 1.7(a) and 1.7(b). The first plot indicates that 
small measurement error in race 11 's winning time is most influential on ^dist- In the 
plot of Umax for /^cUmb, elements 7 and 11 stand out. The fact that the two elements 
have opposite signs can be used for interpretation. They indicate that maximum 
curvature is primarily achieved by perturbing winning times 7 and 11 in opposite 
directions (i.e. making one observation smaller and the other larger). Thus, if the 
record winning times for races 7 and 11 were slightly closer together or farther apart, 
this would aff'ect the slope estimate for CLIMB. 

1.4.1.3 Plotting options 

Note that the standardized i>max is only unique up to sign change. For 
example. Figure 1.8 displays the same standardized vector as Figure 1.6, but with 
all of the elements' signs switched. This suggests an alternative plot in which the 



1 
o.n 

0.5 

0.25 

-0.25 

-o.s 

•0.75 



• • 



5 K» ^5 2Tf "ft 30 ,35 



Figure 1.8: Hills data; alternate rendering of v^ 



13 



signs of the elements are suppressed: a star plot. Here, each element is displayed 
as a point, with the distance from the origin equal to its absolute value. The q 
elements are simply displayed around the point (0,0), starting in the first quadrant 
and ending in the fourth quadrant. The plot is completed by labelling the points 
with indices. Labeling all points can result in cluttered plot (1.9(a)), and so a better 
option is to only label the largest elements (1.9(b)). In these renderings, element 
18 is again noticeably large, but our attention is also drawn to element 7. This 
observation has the characteristics of an influential observation: it has the highest 
climb and one of the longest distances (Figure 1.5(c)), as well as the second largest 
residual. 

Unlike the index plot, the star plot does not give the illusion that the 
indices are in a specific order. However, the suppression of element signs can be an 
advantage or a disadvantage. The advantage is that when looking at several plots 
of directional vectors, suppressing the signs helps the analyst identify patterns by 
emphasizing element size. The disadvantage is that the diagnostic interpretation of 
the signs is lost. 






i^vV 






0.5- 




26 , 31 



-0.5 



0.5 



(a) All elements numbered 



1 -1 




-0.5 



0.5 



14 



(b) Largest elements numbered 



Figure 1.9: Hills data; |vmax| as a star plot 



1.4.2 Scheme 2: Perturbations to Measurement Precision 

Another assumption that can be perturbed is that of equal variance (Cook 

2 

1986). This is accomplished by letting V{yi) = ^ for i = 1, . . . , n. Here the null 
perturbation is Wq = In- 

With this perturbation scheme, we deem that some observations are observed 
with more precision than others. The resulting perturbed MLEs weight each case 
according to its perturbed variance. If certain cases are influential, then even a 
slight down-weighting or up-weighting of them produces large displacements. 

This perturbation scheme has an important connection to the widely-used 
influence measure Cook's Distance (Cook 1977). Recall that the i*^^ Cook's Distance 
is a measure of the change in regression parameter estimates that results from 
deleting the i*'' observation. Clearly a deleted observation provides no information for 
finding /9. Similarly, as the i"* perturbation is decreased from 1, the i*'* observation 
provides less and less information to the estimation process owing to its increased 



:. - - • -. ^ 15 

variance. Thus, case-deletion can be thought of as a rather drastic perturbation: 
one that results in a variance -> oo for the i*^ observation. Case-deletion is referred 
to as a global influence technique because it assesses the effect of perturbations that 
are not local to the null perturbation in ft. 

1.4.2.1 Influence on the full vector of parameter estimates 

Figure 1.10(a) shows the direction of maximum curvature for the Scottish 
hills data set. Again, element 18 dominates the direction of maximum curvature. 
This implies that the precision of observation 18 is the most influential on the full 
set of parameter estimates. 

1.4.2.2 Influence on a^ 

Using the profile likelihood displacement, influence directed upon a^ alone 
can be assessed. Observation 18 is again implicated by Umax as being most locally 
influential (Figure 1.10(b)). 

1.4.2.3 Influence on individual regression parameter estimates 

Figure 1.10(d) shows Umax when isolating effects on /^aist- Although there are 
several points that stand out, observation 7 is identified as being most influential 
since element 7 is large. This race is also implicated as being locally influential in 
the plot of Umax for Z^cUmb (Figure 1.10(c)). In fact, the DEBET AS diagnostic 
(Belsley et al. 1980) also identifies race 7 as being influential on /5cUmb- Recall that 
the value of DEBET ASi^j measures how the f^ parameter estimate is affected 
by deleting the i^^ observation. Writing the n DEBET AS for the /'' parameter 
estimate in vector form, it is shown in Section 2.5 that this vector is proportional to 
the following Hadamard (element-wise) product of vectors: 






16 



It 

0.75 

0.5 

0.25 

■0.25 

•0.5 

■0.75 

-1 



5 10 15 20 25 30 35 



1 
0.75 
0.5 

0.25 

■0.25 

-0.5 

■0.75 

■1 



5 10 15 20 25 30 35 



(a) Vmax for 6 



I 



(b) Vmax for ^^ 



1 

0.75 
0.5 
0.25 



■0.25 
■0.5 
■0.75 

•1 



» . ** • * ** *. »» » * * A t«»* A **» * * ' - 



• * flV "i^ 20 '2^ 30 36 



1 

0.75 

0.5 

0.25 



■0.25 

■0.5 

■0.75 

•1 



,5 ft ** «0 •• 25 30 ,35 



(C) Vmax for /9cUmb 



(d) Vmax for /9dist 



1 

0.75 
0.5 
0.25 



■0.25 

■0.5 

■0.75 

-I 



« »". ' »» * »»»«» % «t. jU »»t * ' » 



I ft' >5i 27?? 30 ^ 



1 

0.75 
0.5 
0.25 



■0.25 

■0.5 

■0.75 

■1 



• * • 

I »»'» »»» t . ' •* . ' * »m»* * »■ 

, S ft •!* ^0 •• 25 30 3! 
• • • 



(e) Umax for 



(f) V2 for 



Figure 1.10: Hills data; variance perturbation 



17 



1 

0.75 

0.5 

0.25 

-0.25 

-0.5 

-0.75 

-1 



• f IT) •l5* 20 2^ 30 ft 



Figure 1.11: Standardized vector oi DFBETAS for /3cUmb 



-«r 






1-/111 

1 
1-/122 



\ l-h„n / 



r2 



^2i 



V ^nj J 



(1.11) 



Here, /i,i is the i^^ diagonal element of the "hat" matrix X{X'X)~^X' , r^ is the i*'^ 
residual, and rij is the z*'' residual from regressing Xj on the other columns of X. 
It can be shown (see Section 2.5) that Umax for directing the effects of the variance 
perturbation on $j is proportional to 



t-\ /.,^ 



^2 



^2j 



(1.12) 



\^n ) \ rnj j 

The elements of the two vectors differ only by a factor of y^- • In fact, standardizing 
the vector of 35 DFBETAS for ^cUmb to length 1 and plotting them (Figure 1.11) 
produces a figure nearly identical to Figure 1.10(c). 



18 

Influence on both regression parameter estimates 

Figure 1.10(e) displays the direction of maximum curvature for directed 
influence on ^. Plot (f) is the direction of largest curvature under the constraint of 
orthogonality to Vmax- The two plots are very similar to the directed influence plots 
in (c) and (d). 

1.5 Basic Perturbations 

As an alternative to finding the direction of maximum curvature, we may 
utilize basic perturbations. The i*'' basic perturbation is the same as the null 
perturbation except for an additive change of 1 in its i*'' element. For example, the 
first basic perturbation is Wq + (1, 0, . . . , 0)', and corresponds to moving a distance 
of 1 along the first axis in fi. Typically, the curvatures for each of the q basic 
perturbations would be computed and then plotted together. 

As an example, consider applying the variance perturbation to a linear 
regression, and examining directed influence on ^. In this case, the i*'' basic 
curvature, Cfr., is a local influence analog to the z*'' Cook's Distance, because it 
corresponds to re- weighting a single observation. It is shown in Section 2.7.2.1 that 
the i^^ basic perturbation for this scheme measures influence on the i^^ fitted value. 
Figure 1.12(a) plots the 35 curvatures for the basic perturbations applied to the 
Scottish hills data, while Figure 1.12(b) plots the 35 Cook's Distances. Although 
the scales differ, they provide similar information. Both plots draw attention to race 
7, but they differ somewhat in their emphasis of races 11 and 18. 

* 1.6 Brief Literature Review 

Local influence techniques have been applied to a number of statistical 
models, most notably univariate and multivariate linear models (Cook 1986; Cook 
and Weisberg 1991; Lawrance 1991; Schwarzmann 1991; Tsai and Wu 1992; Wu 
and Luo 1993a; Kim 1995; Kim 1996c; Tang and Fung 1996; Fung and Tang 1997; 






19 



• . • 



5 10 15 20 25 30 35 



(a) Basic curvatures 



.J' 

2.5 



! 

1.5 



0.5 



5 10 15 20 25 30 35 



(b) Cook's Distances 



Figure 1.12: Hills data; C^'s for variance perturbation and Cook's Distances 



Kim 1998; Rancel and Sierra 1999). A number of articles have also appeared on 
applications in generalized linear models (GLMs) (Cook 1986; Thomas and Cook 
1989; Thomas and Cook 1990; Thomas 1990; Lee and Zhao 1997) and survival 
analysis (Pettitt and Baud 1989; Weissfeld and Schneider 1990; Weissfeld 1990; 
Escobar and Meeker 1992; Barlow 1997; Brown 1997). Other applications include 
linear mixed models (Beckman et al. 1987; Lesaffre and Verbeke 1998; Lesaffre et al. 
1999), principal components analysis (Shi 1997), factor analysis (Jung et al. 1997), 
discriminant analysis (Wang et al. 1996), optimal experimental design (Kim 1991), 
structural equations (Cadigan 1995; Wang and Lee 1996), growth curve models 
(Pan et al. 1996; Pan et al. 1997), spUne smoothing (Thomas 1991), Box-Cox 
transformations (Lawrance 1988; Kim 1996b), nonlinear models (St. Laurent and 
Cook 1993; Wu and Wan 1994), measurement error models (Zhao and Lee 1995; Lee 
and Zhao 1996), and elliptical linear regression models (Galea et al. 1997). 

Many authors have presented modifications of, extentions to, and theoretical 
justifications for the basic methodology (Vos 1991; Schall and Gonin 1991; Schall 
and Dunne 1992; Farebrother 1992; Wu and Luo 1993a; Billor and Loynes 1993; 
Kim 1996a; Fung and Kwan 1997; Lu et al. 1997; Cadigan and Farrell 1999; Poon 



20 

and Poon 1999). Extentions to Bayesian methodology have also been considered 
(McCulloch 1989; Lavine 1992; Pan et al. 1996). 

..'■■,"' 1.7 Outline 

In Chapter 2, computational formulas for local influence analysis are discussed 
and derived for the two perturbation schemes discussed in this chapter. Chapter 3 
contains derivations and examples of local influence diagnostics for linear regression 
under perturbations to the X matrix. In Chapter 4, the methodology is extended to 
a general class of influence measures and estimation techniques. Computational tools 
are also provided. In Chapter 5, inference and local influence analysis for prediction 
problems are discussed. Techniques for assessing estimation and prediction in linear 
mixed models are developed and applied to data. In Chapter 6, some preliminary 
results that can be applied to multi-stage estimations are presented. The chapter 
also contains commentary on additional areas of future research. The appendices 
contain details for data analyses. 



CHAPTER 2 
COMPUTATIONAL ISSUES IN LOCAL INFLUENCE ANALYSIS 

In this chapter, computational formulas for local influence are reviewed 

(Cook 1986; Beckman et al. 1987), To demonstrate the mechanics of the formulas, 

detailed derivations of previous results for the regression model (Cook and Weisberg 

1991; Schwarzmann 1991) will be presented with minor additions. Next, the issue 

of how to obtain objective benchmarks is considered, and some new novel solutions 

are outlined. Finally, local influence is discussed under reparamaterizations of the 

perturbation space ft and the parameter space 0. By applying previous results 

(Escobar and Meeker 1992; Pan et al. 1997), some new diagnostic tools for influence 

on fitted values in linear regression are derived. 

2.1 Curvatures and the Direction of Maximum Curvature 
Cook (1986) showed that the curvature of LD{u}) in direction v can be 
expressed as 

C^ = 2v'A'{-L~^)Av, (2.1) 

where —L is the observed information matrix of the original model, 



-L 



deae' 



e^e 



and Apx, has elements 



_d'ue-y) 



'^ dOidui 



O=0,u:=u}o 



for i = 1, . . . ,p and j = 1, . . . ,q. This result is proved in Chapter 4 as a special case 
of Theorem 4.2. 



21 



22 

The qy~ q matrix F = 2A'{-L )A is the acceleration matrix. It is easily- 
obtained because A is a function of L^id; y), and -L is typically computed 
as part of ML estimation since it serves as an estimated asymptotic covariance 
matrix for the parameter estimates. Hence, computing the curvature in a particular 
direction is straightforward. In addition, the diagonal elements of the acceleration 
matrix are the q basic curvatures. Finally, it is well known how to obtain the 
maximum value of a quadratic form such as (2.1) (Johnson and Wichern 1992, 
pp. 66-68). Specifically, C^ax is the largest eigenvalue of F, and Umax is the 
corresponding eigenvector. Hence, finding Cmax and Umax is numerically feasible. 

The acceleration matrix can provide additional information. For example, the 
second largest eigenvalue is the largest curvature in a direction that is orthogonal 
to Umax- Hence, its corresponding eigenvector indicates a direction of large local 
influence that is unrelated to the first one. Additional elements of the spectral 
decomposition of F can be used in a similar fashion. 

A consequence of this derivation is that it shows how local influence 
techniques reduce the dimensionality of examining LD{u}) via a small number of 
curvatures and directional vectors. It is well known that the number of non-zero 
eigenvalues of a symmetric matrix such as F is equal to its rank. Since it is a 
product of other matrices, the rank of F is no more than the minimum of their 
ranks, which is typically p. 

In the regression examples given previously, p = 3 and g = 35, meaning that 
the local influence approach can summarize the 35-dimensional surface LD{uj) with 
three curvatures and directional vectors. Similarly, LD^^^uj) was summarized with 
just two, as shown in Figures 1.12(c) and (d). 

Despite the advantages of the eigen-analysis of F, one difficulty that arises 
is the occurence of eigenvalues with multiplicities greater than one. For example, 
if C'max = Ai has multiplicity greater than 1, then there is no unique Vmax to plot. 



23 



In these situations it may be more prudent to look at directed influence on single 
parameter estimates. 

2.2 Local Influence Analysis of a Profile Likelihood Displacement 
The profile likelihood displacement, LD^^^\u}), was presented in Section 
1.2 as a measure of influence directed through Oi. Cook (1986) showed that the 
acceleration matrix can be expressed as 



f''^ = -2A'(£'' - B22)A, 



(2.2) 



where 



-B22 — 













^22 


and L22 comes from the partition 




•'^■>V''^ - ■■'■■■' ■■.. . 


J 

J 


^11 


L12 
L22 



w 



where 



d'L{e;y) 



'' dOM. 



9=0 



This result is proved in Chapter 4 as a special case of Theorem 4.2. 

Beckman et al. (1987) considered directing influence on a single parameter 
estimate. They noted that in this case, the acceleration matrix is the outer product 
of a vector, and hence has only one non-zero eigenvalue. Without loss of generality, 
suppose that we wish to direct influence on the first parameter estimate in 0. The 
eigensystem is then given by 



A = 2||A'T| 



V oc A'T, 



(2.3) 






24 



where T is the vector for which TT' = {L - Baa)- It can be expressed as 




(2.4) 



where c = Ln — L12L22 L21. 

2.3 Building Blocks of Local Influence for Regression 
In this section, the building blocks of F for a multiple linear regression are 
constructed. Subsequent sections apply these results to the response and variance 
perturbations used in Chapter 1. 

2.3.1 Notation 

• Unless otherwise noted, the derivations assume the presence of an inter- 
cept in the model so that the residuals sum to zero. Centering and scaling 
the regressors tends to faciliate interpretation, but is not necessary. 

• The number of regressors is k — p — 1. 

• H = X{X'X)-^X. 

• Xi represents the i*^ column of X, and x\ represents the i*'' row of X. 

• r = (/„ — H)y represents the residuals. 

• Tj represents the residuals from fitting a model to the i*'' predictor using 
the other predictors. These can be considered as values of an adjusted 
predictor (i.e. adjusted for relationships with other predictors). Note: 
Hvi = Ti and (/ — H)ri = 0. 

• X(j) represents the X matrix with the i^^ column removed. Similarly, 
/S/^) is the regression parameter vector with the i*^ element removed, and 
A(i) is A without the i"* row. 



'"<. % * 



-.r-l 



25 



2.3.2 Acceleration Matrix 

The ML estimates of are the same as those of ordinary least squares, 
3 = {X'Xy^X'y , and the ML estimate of a^ is ^. The p x p estimated 
asymptotic variance-covariance matrix of the MLEs is ' > 



L = 



a2 {x'xy^ 

0' 



251 

n 



Partitioning A' into [A'^ A'^2], the acceleration matrix is given by 



F = 2A'(-£ )A 



Aa' 



= 2a^A'^ {XX'y Ap + K,A,2 

lb 



2.3.3 Acceleration Matrix for Directed Influence on a^ - • • 

The acceleration matrix for directed influence on a"^ will utilize the matrix 
L — B22) which is given by 







/ 2,7'' 







This implies that the acceleration matrix for directed influence on cr is 

F^"'^ = -2A'(X"' - B22)A 

= — A;2A^2. 
n 



(2.5) 



Noting that A'^2 is in fact a 9 x 1 vector, there will be a single eigenvalue and 
eigenvector given by 



\ _ 4ff''|| A' 



vi oc A'^2 



These quantities are the maximum curvature and its direction, respectively. 



26 



2.3.4 Acceleration Matrix for Directed Influence on /3 
When directing influence on /3, we have 



L — B22 — 



-a^xx'y^ 

C 



The resulting acceleration matrix is 

f^^=2a'A'f,{XX')-'Ap. • (2.6) 

Note that this implies that directed local influence on ^ is equivalent to local 
influence for /9 when treating a^ as known and equal to its MLE. 

2.3.5 Acceleration Matrix for Directed Influence on a Single Regression Parameter 
By applying the general results of Beckman et al. (1987), we can construct T 
(2.4) for directed influence on $1 in a regression model. Using results for the inverse 
of a partitioned matrix, TT' = -{L - B22) can be expressed as 



-(X(i)X(i)) X(i)Xi 




-XiX(i)(Xji)X(i)) ^ 



(2.7) 



where d = X'lXi - XiX(i)(X'(i)X(i)) ^X'^^^Xi. 

We denote {X'/^-.X (^i))~^ X'^^^^X i by 7^, as it is the regression parameter 
estimates from modeling Xi as a linear combination of the other explanatory 
variables. This simplifies the notation of (2.7) greatly: 



ir' - B22) = TT^, 

Fi 



1 


-t'i 


7i 


7i7i 





0' 



(2.8) 



27 



Using these results, we find 



Fil 



1 

-7i 




(2.9) 



and this leads to the following expressions for the maximum curvature and its 
direction when directing influence on /5i : 

C^ax = 2||A'TI| = j^\\A'^, - A^^^^Till t^max oc A'T a A^^ - A^^^^t^. 

The next two sections apply these results to document more completely the 
two perturbation schemes given in Chapter 1. 

2.4 Perturbations to the Response in Regression 
The results of this section closely follow those of Schwarzmann (1991), 
with only minor additions. As in Chapter 1, a set of additive perturbations are 
introduced into the observed data to examine the influence of measurement error in 
the response. An n x 1 vector of perturbations is introduced such that the perturbed 
likelihood is based on Y + a; ~ N{X/3, a^In)- The null perturbation is Wq = 0„. 

Equivalent results can be obtained by perturbing the model of the mean to be 
X/3 — (jj. That is, by considering a model assuming Y ~ N{XP — co, (T^In)- Hence, 
the measurement error perturbation can be interpreted as a type of mean-shift 
outlier model; each perturbation is a "stand-in" for an additional parameter that 
could correct or exacerbate lack-of-fit. 



28 



2.4.1 Building Blocks 

The perturbed likelihood is 

T? 1 

K{e;y)oc--\oga'-^{y-X0 + ujy{y-Xl3 + uj) 

n 1 

= -- loga^ - — (y + u>)'(y + w) 

n 1 

= -- loga^ - ^(y'y + 2u;'y + w'a;), 

where y = y - X^. The next step is construction of A. Partial derivatives with 
respect to a> are as follows: ., 

dK{0;y) 1 



du: 



2(t2 
1 



(2y + 2u;) 



1 



(y + w) 



--(y-X/3 + a;). 



Second partial derivatives are then 

d'L^{0;y) 1 



X 



%l 



Evaluating at the original model yields 

d'L^{e;y) 






du:dl3' 
d'L^e^y) 



1 



0=0,U}=OJo ^ 



divda^ 

From these quantities, the n x p matrix A' can be constructed: 

1 



A' = 



X 4^r 



^ ' -- V 



29 



Finally, the acceleration matrix is 



~ ' 1 
F = 2— 



X I 






<t2 (XX')"' 
0' 



2£l 
n 



^2 



X' 

1 ».' 



(2.10) 



2 4 



:^2 



2.4.2 Directed Influence on a 

The acceleration matrix for the profile likelihood displacement for a^ is 
simply the second term of (2.10): 



= — ttTV . 



na^ 



(2.11) 



Solving the characteristic equations {F" - XI)v = directly, the single 
eigensolution is 



C/tnax — X 



max — '^max a-2 



^max 0^ • • 



Therefore, as mentioned in Chapter 1, Umax for directed influence on a'^ is 
proportional to the residuals, 

2.4.3 Directed Influence on ^j 

Without loss of generality, consider directed influence on $i. Rather than 
find the acceleration matrix, we proceed directly to determining the sole eigenvector 
proportional to A'T and the corresponding curvature 2||A'T|p, as per (2.3) and 



i" r-' — - ■e?^> 



30 



(2.9): 









-X"! X(i) 



ri\ 



1 
-7i 



= ^ [^1 - ^(l)"^!] 



a||ri 
1 

•^Ikil 
a ri 



rn 



and 



a 



o" Fil 



r^i 



Therefore, the direction of maximum curvature for directed influence on $j is 

proportional to the f^ adjusted predictor. 

2.4.3.1 Directed influence on /3 • 

The acceleration matrix for the profile likelihood displacement for $ is the 
first term of (2.10): 



M 



^H. 



(2.12) 



This matrix has a single non-zero eigenvalue of ^ with multiplicity k, implying that 
there are an infinite number of directional vectors that have maximum curvature. 
Although not discussed by Schwarzmann (1991), one set of orthogonal solutions can 
be motivated by considering the canonical form of the model: 



Y ^Zoc + e, 



(2.13) 



31 



where 



Z = Xifi 




a = <fi'l3 




ip = 


<Pi ■ 


• fk 



and 
(p-= the i*'* eigenvector of X'X. ' >. 

Because (pep' = Ik, this is simply a reparameterization of the original regression 
model. The new regressors, Zi = Xip^, ...,Zk = Xp^., are known as the principal 
components, and they are mutually orthogonal. In addition, each satisfies the 
characteristic equations oi F : ' 



{f'^-P)z. = ^Ah-i)x^, 



2 

^' (2.14) 

= 0. 



Therefore, the standardized principal component regressors can serve as an 
orthogonal set of directional vectors, each of which has maximum curvature. Note 
also that the k adjusted regressors, ri, . . . , rjt, are also directions of majcimum 
curvature, but they do not constitute an orthogonal set. 
2.4.3.2 Influence on 9 

The acceleration matrix F given in (2.10) has two non-zero eigenvalues: -^ 
and j2, with the second one having multiplicity k. The first eigenvector is a r, 
and the set of k principal components can be used to complete an orthogonal set of 
eigenvectors. Comfirmation of these solutions is straightforward. 

■ ' 2.5 Unequal Measurement Precision Applied to Regression 

Cook introduced this perturbation in his original (1986) paper on local 
influence, and the results in this section come from his work. An n x 1 vector 
of perturbations is introduced such that the perturbed likelihood is based upon 



32 

Y ~ N{X0,a'^D{l/u})), where D{l/u:) denotes a diagonal matrix with i*'' element 
^. The null perturbation is a?o = In- The n basic perturbations would be the local 
versions of the n Cook's distances (Cook 1977; Cook 1979). In addition, the n basic 
perturbations applied to LD^^^u) are local influence analogs of the n DFBETAS 

for ^j. 

A second interpretation of this perturbation scheme comes from the idea 
of case- weights. Case- weights are used in an analysis when each observation has 
been sampled with unequal probability. Under disproportionate sampling, cases are 
weighted by the inverse of their sampling probability in order to obtain unbiased 
regression parameter estimates. In the normal linear model, this weighting is 
numerically equivalent to perturbing the error variances. Hence, this perturbation 
scheme can be used to examine sensitivity to the assumption of random sampling. 

2.5.1 Building Blocks 

To begin, we develop the perturbed likelihood, which is 

■n 1 

U0;y) oc ~^\oga' - — (y - X^)'D(u;)(j/ - Xfi) 

= -^ log a^ - —y'D{uj)if. 

The next step is construction of A. Partial derivatives with respect to /3 are as 
follows: 

^^"j^' ^^ = -^(-2X'D(u;)y + 2X'D{u:)Xfi) 

= ^X'D{u:)y, 



<t2' 



and the partial derivative with respect to a is 



dL^{0\y) n 1 ,, 



33 



Second partial derivatives are then 






and evaluating at the original model yields: 



d'LUe;y) 



dl3du}' 



1 






da^doj' 



2^4^^'^^ 2^4 ^*«' 



0=9,uj=u)o 

where • is the denotes the Hadamard (element- wise) product, implying that r,, is 
an n X 1 vector with elements rf. Combining these we can write 

1 



A' = 



D{r)X ^r 



2ct2 ' sq 



and 



■ F^2a^ 



^D{r)X 



+ 



4a' 



n 



(XX')" 



T^X'Dir) 



(2.15) 



a2 



D{r)HD{r) + 



2na-' ''"^ 



2.5.2 Directed Influence on a"^ 

The acceleration matrix for the profile likelihood displacement for a^ is 
simply the second term of (2.15): 






Solving the characteristic equations directly, the single eigensolution is 



(2.16) 






"^max ^ ^ sq 



34 



This suggests that o^ is most affected by variance perturbations when they are 
scaled according to the squared residuals. 

2.5.3 Directed influence on yg,- 

Without loss of generality, consider directed influence on ^i. Again we will 
proceed directly to determining the sole eigenvector proportional to A'T and the 
corresponding curvature 2||A'T|p as per (2.3) and (2.9): 





Xi X(i) 


a 


1 
-7i 


a T\ 




= r>(r)ri 




(Xr -Ty 









(2.17) 



and 



t.'max — •^ 



1 -n ^ 112 0„ 11^" ''I 

-r • Ti = zn 



^ n 



r||2||ri||2- 



(2.18) 



This suggests that maximum local influence of the perturbation on $i is 
achieved by perturbing cases that have both large residuals and extreme values for 
the adjusted covariate. Cook (1986) provided additional details of how this result 
compares to the information in a detrended added- variable plot. 

Also, we may compare the information in Umax with that provided by the 
DEBET AS diagnostic (Belsley et al. 1980). Recall that the value of DEBETASij 
is the standardized diflference between $j and /3j,(i), the estimate when deleting the 
i*'' observation. The formula is given by 



DEBET ASij = 



^ij'i 



^Jv^) 



s\\cj\\{\-hii) 



35 



where r^ is the i*'' residual, Cj is the f" column of X{X'X)-\ Cij is the i^^ element 
of Cj, and s^ = ^^^ is the mean squared-error estimate of cr^. The vector Cj can be 
written in terms of the /'' adjusted regressor by partitioning (X'X)"^ as per (2.7). 
For i = 1, 



Ci = 



Xi X 



(1) 



ri 



1 

-7i 



'•i 



jri. 



Using this equality, the vector of n DEBET AS for /3j is proportional to 



/ 



1 



\ 



i-ftii 
1 



i-ft 



22 



r • Ti 



(2.19) 



\ i-/i„„ / 
Since Umax oc r • rj, observations with large leverage stand out more in a plot of the 
n DEBET AS for /3j than in a plot of Vmax- This algebraic relationship between 
Vmax and DEBET AS has not been noted by previous authors. 

2.5.4 Directed influence on ^ 

The acceleration matrix for directed influence on y3 is the first term of (2.15): 

2 



F^^ = 



[D{r)HD{r)]. 



The eigensystem of the matrix does not have a closed-form. However, some idea of 
the directed influence of this perturbation on ^ can be ascertained by utilizing basic 
perturbations. The i^^ basic perturbation corresponds to modifying the variance of 
only the i"* case, and its curvature is the i^^ diagonal element of F (Cook 1986): 

2 






(2.20) 



1 

0.75 

0.5 

0.25 



•0.25 

■0.5 

-0.75 

•1 



• J AT •!> • 2(f S" 30 ^ 



(a) V2 for 6 



36 



1 

0.75 
0.5 
0.25 



■0.25 

■0.5 

■0.75 

•1 



* «» » ■ 



•n 



*-*t. 



5 itl "IS 20 ••25 ' ^ 35 
• * • • 



(b) vz for 



Figure 2.1: Hills data; variance perturbation 



This is similar to the i*^ Cook's Distance (Cook 1977): 

. • V 1 



k{\ - hiifs 






2.5.5 Influence on 



The eigensystem of F is not closed- form, but it can be found numerically. 
As an example, consider the perturbation of the error variances for the Scottish 
hills data. There are three non-zero eigenvalues for F: 14.82, 4.91, and 0.37. The 
eigenvector associated with the largest eigenvalue was plotted in Figure 1.10(a). It 
is similar to v^aax for directed influence on o-^, as given in Figure 1.10(b). The other 
two eigenvectors for F are plotted in Figure 2.1. : " 

2.6 Benchmarks for Perturbation Analyses -• 
An analyst using local influence diagnostics will want to know when a 
curvature is "too big" and when an element of Umax is "too big." Calibration of local 
influence diagnostics was the topic of much debate in the discussion and rejoinder 
to Cook (1986), and there is still no consensus opinion on the subject. Much of the 
difficulty lies in the fact that the perturbation is introduced into the problem by the 



37 

analyst. Because the technique does not involve hypothesis testing, critical values 
are not appropriate. However, there have been attempts to develop rule-of-thumb 
benchmarks, as exist for Cook's Distance and other common influence diagnostics. 

In this section, we discuss some general difficulties in developing benchmarks 
for perturbation analyses, as well as some issues specific to the local influence 
approach. Although it is argued that all sensitivity is "relative", some simple 
methods for assessing the size of curvatures and the size of the elements of directional 
vectors are presented. , ,, 

2.6.1 The Analyst's Dilemma . , ■,, '* 

As mentioned before, the perturbation is introduced by the analyst, and this 
implies that he/she must decide whether the effects of perturbation are important, 
much the way a researcher decides what effect size is important. Consider a situation 
in which the researcher is actually able to give the size or stochastic behavior of 
the perturbation. Here, we may concretely measure the effects of perturbation 
by whether or not the conclusion of a hypothesis test changes. Alternatively, the 
effects can be measured by whether or not the perturbed estimates are outside an 
unperturbed confidence ellipsoid. However, despite having a concrete measure of the 
perturbations's effects, assessing the practical relevance of the results still falls to 
the analyst. 

2.6.2 Calibrating Curvatures in Local Influence 

In addition to the general limitations of a perturbation analysis, an analyst 
using the local influence approach must also contend with the fact that the behavior 
and size of the perturbations are undetermined apart from the basic formulation. 
Loynes (1986) and Beckman et al. (1987) noted that curvatures do not have 
the invariance properties needed for a catch-all method of benchmarking. This 
necessitates a reliance upon relative influence. 



• :. „ ; 38 

2.6.2.1 1-to-l coordinate changes in fl 

Following Loynes (1986) and Beckman et al. (1987), let us consider 1-to-l 
coordinate changes in the perturbation space ft. That is, a new perturbation scheme 
is constructed with elements u* = k{LOi) ior i = 1, ... q, for a smooth function k. The 
new scheme's perturbation, a;*, will not yield the same curvature as uj. Specifically, 



Coi* = (^) _ C^y where uiig denotes the null perturbation value for a;, 



^duii 



U=Ui, 



As an example, consider perturbing the variance of j/j to be 5? rather than rr- 

i • 

Here, k{uji) = uof, resulting in new curvatures that are (^l^i^i)^ = (2a;i)^.^i = 4 
times the original ones. Hence, despite the fact that the two versions of fl can 
produce the same set of perturbed models, the curvature in identical directions will 
change. This implies that a universal benchmark for curvatures is not possible. 
Indeed, given the arbitrariness of scaling the perturbation, it should come £is no 
surprise that curvatures cannot be compared across perturbation schemes. However, 
the direction of maximum curvature is unchanged under 1-to-l coordinate changes 
in fl. ' ■ ,■ 

2.6.2.2 Interpreting curvatures • % 

In this section, a simple way to assess curvatures from the same perturbation 
scheme is given. First, yC is a Taylor series approximation to LD{u)q + av) 
(Lawrance 1991; Escobar and Meeker 1992). Also, assuming that n is large, an 
asymptotic 100(1 — q)% likelihood-based confidence region for is given by values 
oi such that 

2{L{9-y)-L{9-y))<xl,-^. 

Thus, if twice the curvature in direction v is larger than Xp.i-o? then a perturbation 
at a distance of one from Wq in direction v moves the parameter estimates to the 
edge of the asymptotic confidence region. 



" '.: ' • - ■ :*■ .^ 39 

To further develop this idea, let us define the size of a perturbation to be 
its Euclidean distance from the null perturbation (i.e., ||aj — ci>o||). Suppose further 
that we have perturbations in two different directions: u;^ in direction va and ujb 
in direction ug. If |C^| is fc times larger than \Cb\, then perturbation wg must be 
s/k times larger than u;^ to have the same effect on the likelihood displacement. 
Equivalently, in order to perturb the estimates to the edge of the confidence ellipsoid, 
it would be necessary to travel \/k times farther in direction vg in ft as in direction 
v^. This provides a simple interpretation for the relative size of two curvatures. 

2.6.2.3 Internal benchmarks for curvatures 

Building on the interpretation given above, curvatures can be benchmarked 
based on their relative size to some sort of "average" curvature for a given scheme 
and fitted model. For instance, the average of the q basic curvatures for the fitted 
model can serve as a benchmark. Alternatively, the average curvature could be 
computed by (1) allowing the squared elements of v to have a Dirichlet(ai . . .ag) 
distribution with all oij equal to one, and then (2) finding the expected curvature. 
This would be equivalent to finding the average approximate displacement at a 
radius of \/2 from Wq (Cook 1986). If such a benchmark is difficult to compute, a 
Monte Carlo estimate can be calculated. Because these benchmarks are computed 
using curvatures from the fitted model, they result in comparisons that are internal 
to the fitted model. The conformal normal curvature of Poon and Poon (1999) 
would also be an internal way to assess the sizes of curvatures. 

2 .6.2 .4 External benchmarks for curvatures 

It is entirely possible that all of the curvatures of a fitted model are large 
due to the nature of the data set, and using internal benchmarks will not alert 
the researcher to such a phenomenon. To address this, a benchmark that is not 
dependent on the observed data is needed. To this end, we might utilize the 
expected value of the curvature in direction v, using in place of the unknown 






40 



parameters 0. The expected curvatures can then be plotted along with the observed 
curvatures (Cadigan and Farrell 1999). If the expectation of Cmax is difficult to 
obtain, Monte Carlo simulations could be used to estimate it. As an alternative, 
Schall and Dunne (1992) developed a means of external benchmarking by developing 
a scaled curvature using results on parameter orthogonality (Cox and Reid 1987). 
2.6.2.5 Internal versus external benchmarks 

Internal and external benchmarks are complimentary tools, since each one 
provides different diagnostic information. Comparing curvatures from the same 
fitted model highlights relative sensitivity given the data and the design. Comparing 
curvatures to their expectations highlights whether the sensitivities are expected 
given the experimental or sampling design. 

2.6.3 Calibration of Directional Vectors 

For a directional vector from the eigensystem of F, the analyst must compare 
the individual elements in the vector to determine which aspects of the perturbation 
are most important. Typically, it is suggested that the analyst simply look for 
elements that stand out as being large relative to the rest. Alternately, a vector 
whose elements are all equal in size can serve as a means of comparison. That is to 
say, if each of the q elements of the perturbation are contributing equally, then each 
is equal to either -i= or — -^. A particular element whose absolute value is larger 
than, say, 2 or 3 times that of equally contributing elements (e.c.e.) would then be 
of interest. Hence, the inclusion of horizontal lines at ±-^ and ±-^ in the plot of 
directional vectors may be useful. 

■ 2.7 Local Assessment Under Reparameterizations ' 

Pan et al. (1997) show that Vmax is invariant to any 1-to-l measureable 
transformation of the parameter space 0. In fact, for = h{(f>), the entire 
eigensystem of F^ is the same as that of JFg. However, directed influence on the 



41 



estimates of the redefined parameters can then be examined. Using similar steps to 
Pan et al. (1997) and Escobar and Meeker (1992), we can write update formulas 
for A^ and L^ and then calculate directed influence on a subset of the redefined 
parameter estimates. Using partial diflFerentiation, the building blocks of the 
acceleration matrix under the parameterization </> can be written: 






de'.. de , 



(2.21) 



(2.22) 



where all partial derivatives are evaluated at the fitted model. From this, the 
acceleration matrix for directed influence on <p-^ can be obtained from the partition 

</> = (</>i,</»2)': 



,[</>! 



-2A;(L 
,00 



-1 



^022 )^<^ 



= -2A 






where 



dc/) 
= -2A^(L, -^B,,,-)A„ 







(2.23) 



il^ 

?>22 



Zy^22 comes from the partition 



-t'011 L 



4>12 



^4>2\ ^4>22 



and all partial derivatives are again evaluated at the fitted model. 

Although Pan et al. (1997) suggested that (2.23) is invariant to the choice 
of 027 they did not provide a proof. To show this result, consider another 



42 



reparameterization for which the first pi elements are the same as those of (f>: 





V'l 




01 


xj,= 




= 






V'2 




^2 



This implies that 



dtj)' 



Ipi 

d<f>2 9(p2 

dxl>[ 9^2 



(2.24) 



and this leads to the following expression for the lower right submatrix of L^: 



8x1^2 dii)'2 

d(f>' 89' ■■ 80 d(j> 





av'2 



8e' .. 80 



8(f> "50' 





d4>2 
di>'2 



(2.25) 



_ d^dO^i 00 8(p2 

8'4)2d(j)2 ^8(1)2 811^2 

8x1,2 '^^^ 8xp'2' 
where all partial derivatives are evaluated at the original model. Meanwhile, the 
acceleration matrix for directed influence on i/j^ is 



,[1/-! 



-2a;(l; - 



= -2a;(l, 






)A, 


L-^ 

1p22 




80 





8x1)' 


L 


-1 

022 



(2.26) 



5^ 
8x1) 



)Afi 



The invariance of the acceleration matrix for directed influence on </>i to the choice 
of 02 is proved by showing that (2.26) is equivalent to (2.23). Using the inverse of 



43 



(2.25), 



80 



dtj)' 







L-/.. 



80' 


80 8(f) 





8x1) 


" a</)' 8^' 


. ° ^^i . 



d^d0[^ 
8x1^ 8(j) 



[pi 












L-^l 



T M 

'pi avi 
n ^ 









n ^02 r-l 902 





9(9' 


L-^ 

022 


8(f>' 



(2.27) 



50 



_ 80 

90 

~ 8(t>' 

_ 80 

Substituting (2.27) into (2.26) completes the result. 

2.7.1 Directed Influence on a Linear Combination of Parameter Estimates 

As mentioned in Escobar and Meeker (1992), a specific application of these 
results is to consider directed influence on a linear combination of parameter 
estimates, x0. There are many ways to reparameterize the model such that x'0 is 
one of the new parameters. For example, assuming that the first element of x is 
non-zero, </> can be defined as A0, where 

xi a;(i) 
Op_i Ip-i 

where Xi and a;(i) are the first and remaining elements of a;, respectively (Escobar 
and Meeker 1992). Using equation (2.23), the acceleration matrix for directed 
influence on ^i = x'0 can be computed, and the resulting diagnostics are invariant 
to the choice of (/>2, . . . , 0p. By using a prudent choice of new parameters, we can 
obtain easy-to-compute formulas for the single eigensolution of F , as per the 
following theorem. 



Theorem 2.1 

The maximum curvature and its direction for directed influence on x'O is given by 

where ki = Lg x. 
Proof of Theorem 2.1 

•• 1/2 

To begin, consider a model with parameterization (j) — K'Lg 0, where 



K 



is a full rank p x p matrix with 



Ki K2 ■ ■ • Kr 



ki = Lg X 

>- Kkj= iy^j 

k[ki = k[ki i = 2. . .p. 

By construction, 

Using the update formulas in (2.21) and (2.22) it follows that 

A 9^' A 



iifciir 



"''■W' 



45 



Next, we show that directed influence on ^i = x'0 is not a function oi k2, ■ . ■ , kp. 
From equation (2.23), we find 

\ 



f''' = 2(, 



1 



-1/2 



Ifell 



A'.Le' K) ||fci||^J-||fex| 



0' 

I 



J/ 



(y^^'^^"'^^^^ 



' :A'L-'^'k 



IfcllP " ' 



1 0' 




K'T'^'Ae 



-^^A'eCk,k[L-"'Ae 
I'^i II 

-A'flLfl xx'La Afl. 



Ifel 



(2.28) 



The single eigensolution of this rank-one matrix yields the final solution for Cmax 

and Umax- □ , ■ 

The construction of is worth some note. The premultiplication of by 

•■ 1/2 

K'Lg orthogonalizes the parameters, in the sense that the off-diagonal elements 
of the observed information matrix for </) are zero. (Note that the orthogonalization 
is with respect to the observed information matrix as in Tawn (1987), rather 

•• 1/2 

than the expected information as in Cox and Reid (1987).) First, a = Lg 
is a parameterization with an observed information matrix equal to the identity 
matrix. Then, K is constructed to make orthogonal contrasts of a, with the first 
column corresponding to the linear combination of interest. Finally, by scaling the 
columns of K to all have the same length as fei, K becomes a scalar multiple of 
an orthonormal matrix, and this leads to the algebraic simplifications. Also, if 
x'6 = 9j the local influence diagnostics provided by the theorem are the same as 
those provided by (2.3), as per Beckman et al. (1987). 



46 

2.7.2 Application to Regression 

As an example, consider perturbing the variances of a regression model and 
examining influence on a linear combination of the regression parameter estimates, 
fi{x) = x'yS. From (2.28), we find 



^ D{r)X{X'X)-^xx'{X'Xy^X'D{r), 



(2.29) 



where K = x'{X'X)~'^x . Further, 

v^^(xx'{X'X)-'^X'D{r)(x Cov[X'fi,x'p) ■ r (2.30) 

. C„,a, = 4-|l^'(^'^)''^'^('-)ir. -• " (2-31) 

where Cov(X^, x'yS) is the vector of empirical covariances between X0 and x'/9. 
2.7.2.1 A DFFITS-type measure 

Using the variance perturbation and directed influence on x 0, a local 
influence analog of DFFITS can be defined. Recall that DFFITSj is defined as the 
externally studentized change in the i^^ fitted value when deleting the i*'* observation 
(Belsley et al. 1980): 

DFFITS. = ^^^1^^£^, 

where fl{xi)[i] is the estimate of //(xj) with the i*^ observation deleted and sh is the 
mean squared-error estimate of a^ with the i*'' observation deleted. A large value 
for DFFITS indicates that the observation is somewhat "self-predicting" . It can be 
shown that DFFITS, simplifies to 



47 

A similar measure can be obtained by examining local influence on 
p,{Xi) = a;-3 when using the i*^ basic variance perturbation. That is, only the 
i*'* observation is reweighted, and the curvature for directed influence on /i(cc,) is 
examined. The curvature for the i*'' basic perturbation is the i*'* diagonal element of 

Cft. = BasicFITSi = ^^. (2.32) 

A large value of BasicFITS indicates that the perturbation of the observation's 
precision/weighting greatly influences its fitted value. Thus it is a local influence 
analog of DFFITS. Coincidently, BasicFITSj is equivalent to the i^^ basic curvature 
for the acceleration matrix, as given in (2.20). That implies that for the variance 
perturbation, the n basic curvatures of LD{u>) are measuring influence on the n 
fitted values. 

While BasicFITSj is the f*'' basic curvature when directing influence on //(xj), 
we can also define MaxFITSi to be C^ax for directed influence on fi{xi). Using 
(2.31), . . 

2 " 
C:nax. = MaxFITSj = T^ J] ^r-J. (2.33) 

By definition the, i*'' MaxFITS can be no smaller than the i^^ BasicFITS. While 
BasicFITS provides information about which observation's perturbation influences 
its fitted value, MaxFITS provides information about how sensitive each fitted value 
is relative to the others. • 

Example 

Consider examining influence on the fitted values for each of the 35 covariate 
profiles in the Scottish hills data set. Figure 2.2(a) displays the 35 BasicFITS (i.e., 
each of the basic curvatures for directed influence on the corresponding fi{xi)) . 
Race 7 is indicated as being influential on its fitted value, while race 18 is also shown 



48 



5 10 15 20 25 30 35 



(a) BasicFITS 






• ' • • • 



5 10 15 20 25 30 35 









(b) MaxFITS 



I * ■' > 



t : '• ; 



I ,1 



5 10 15 20 25 30 35 



.: , .. (c) BasicFITS (stars connected by 

solid line) and MaxFITS (triangles 
connected by dashed line) 

Figure 2.2: Hills data; cr^D(l/u;) perturbation; influence on fitted values 



to be quite influential on its fitted value. Plot (b) shows the 35 MaxFITS (i.e. the 
C'max's for directed influence on each of the 35 fi{xi)). Since no point stands out 
as being large, the plot indicates that no fitted value is particularly sensitivety to 
variance perturbations. Plot (c) shows both sets of diagnostics. The BasicFITS 
values are stars joined by a solid line, while the MaxFITS values are triangles joined 
by a dashed line. The closeness of the MaxFITS and BasicFITS values for race 7 
indicates that this observation is somewhat self-predicting. 



..t.V- H l... 



d-' 



CHAPTERS 
PERTURBATIONS TO THE X MATRIX IN REGRESSION 

In this chapter, the influence that infinitesimal errors in the X matrix can 

have on parameter estimates in a linear regression is addressed. First, previous 

results for single column and single row perturbations are reviewed. This is followed 

by a generalization of these results to perturbing all elements in the X matrix. It is 

shown that the eigensolution of F is related to principal components regression, 

collinearity, and outliers. Finally, new graphical techniques are presented with the 

results applied to the Scottish hills data set and an educational outcomes data set. 

3.1 Perturbations to Explanatory Variables 
Letting k denote the number of predictors. Cook (1986) proposed that an 
nA;-sized perturbation can be incorporated into the design matrix in an additive 
fashion. These perturbations represent fixed errors in the regressors and should 
not be confused with random errors-in- variables techniques (Fuller 1987). The 
perturbation scheme can be used to assess the effects of small rounding or truncation 
errors in the explanatory variables. Upon perturbation, the X matrix becomes 



X + 



UJi 


W„+l 


^2n+l ■ 


• • ^kn-n+1 


UJ2 


^n+2 


... 


■ ■ ljJkn-n+2 


Us 






■ ■ ^kn-n+3 


OOn 


i^2n 


W3n 


■■ OJkn 



The use of a single subscript implies w = vec(W^), where the vec operator stacks the 
columns of the matrix into a single column. When particular elements of W greatly 
aflPect parameter estimation, it indicates that the corresponding element of X is 



49 



'^-^C 50 

influential. This allows the analyst to identify specific combinations of explanatory 
variable values that influence estimation. As will be shown in Section 3.4, sensitivity 
of the regression parameter estimates to these kinds of perturbations is greater when 
there is coUinearity amongst the regressors and when the model is calibrated to fit 
the observed data too closely. 

3.2 Perturbations to a Single Column 
The special case of only perturbing a single column of X was discussed 
in Cook (1986), Thomas and Cook (1989), and Cook and Weisberg (1991). This 
section recreates their results in detail. 

3.2.1 Building Blocks 

Here and in the following sections, di represents a vector with a 1 in the «'*'' 
position and zeros elsewhere. The length of the vector depends on the context of its 
use. Note that for di of size k, 

d[{X'Xr'di = rr^ (3.1) 

1 1 ' 1 1 1 



and 



X{X'X)-'di = r-^ri, :' (3.2) 

Ir ill .. 

where r, contains the residuals from regressing Xi on the other columns of X. 

Without loss of generality, suppose perturbations are added to the n values 

of the first predictor (i.e., Xi becomes Xi +u}). The perturbed likelihood is 

n I 

K{9; y) oc -- loga^ - --{y -Xf3- Aa;)'(y -X/3- ^u;) 

- ^ ^^ (3.3) 

= ~ loga^ - —{y'y - 2My + ^V^). 



51 



The next step is the construction of A. Partial derivatives with respect to u) are as 
follows: 






Second partial derivatives are then 



d'LUe;y) ^ A 
d'LUe;y) 1 



-(y-X(i)/3(i)-2A(A:i+a;)) 



X(i) 



duda^ 



— (y-X0-/3,u:). 



Evaluating at the original model yields 



.. du}d/3[ 
d^LU0;y) 



^{r-^X^) 



d'K{0;y) 






0=B,u:=u:o 






. ' duda"^ 

and from these quantities we construct the n x p matrix 

1 



A' = 






1 






(3.4) 



r - A^i -A-X:(i) 
rd[ - AX 
where di is a A; x 1 vector. It follows that the acceleration matrix is given by 






rd[-PiX -kr 



a^ {XX'y'^ 
0' 



Ml 

n 



dir' - Z^iX' 



-^r' 

/T'* 



(3.5) 



J-^mrr'.PlH-Jl^(rr',^ry) 



52 



3.2.2 Directed Influence on a^ 



The acceleration matrix for the profile likelihood displacement for a^ is given 



by 



n 

= — jrr . 



(3.6) 
(3.7) 



Solving the characteristic equations {F - XI)v = directly, the single 
eigensolution is 



C„ 



4/3? 



l^max 0^ ' • 



Therefore the direction of maximum curvature for directed influence on a'^ is always 
proportional to the residuals regardless of which column of X is perturbed, but the 
corresponding curvature is a dependent upon the effect size, \$i\. 

3.2.3 Directed Influence on /9i 

Using the results of Beckman et al. (1987), the sole eigenvector is proportional 
to A'T and the corresponding curvature is 2||A'T|p as per (2.3): 



'^max CX . - 



r-AXi -AX 



(1) 



n 



1 

-7i 



1 



(^\\ri\ 
1 



cr||ri 
oc r — /3iri 



r-/3iXi+/5i7iXi 
r - /3iri 



, (3.8) 



53 



and 



'^max — 



mr,\\^ 



*9II 21 



= l[^'t^ 






(3.9) 



o^ \ \\ri\\ 

The direction of maximum curvature is a weighted average of the residuals and the 
adjusted regressor. As noted in Cook and Weisberg (1991), ^iVi — y — y^i), where 
y(i) is the vector of fitted values when Xi is excluded from the model. Using this 
result, we find Vmax oc 2r — r^^, where r^j is the vector of residuals when modeling 
y using all regressors except Xi. This indicates that the direction is dominated by 
the residuals, a fact which is demonstrated by examples later in this chapter. Cook 
and Weisberg (1991) discussed how Umax relates to de-trended added variable plots. 

3.2.4 Directed Influence on Other Regression Parameter Estimates 

Thomas and Cook (1989) examined the influence that perturbations in one 
column of X can have on the regression parameter estimate corresponding to a 
different column. Without loss of generality, suppose we perturb the first column of 
X and look at the effects on the last regression parameter estimate ^j.. Again using 
the results of Beckman et al. (1987) we can obtain the maximum curvature and its 
direction. Letting di be of size /: — 1, 



v^ax OC A'T = J, (rd[ - /3iX(fc)) -^X^ 



Fit 



= jrrrr; [-rd[% + kH(k)Xk - PiXk 



-Ik 
1 






(3.10) 



54 



where 7fci is the regression parameter estimate corresponding to Xi when predicting 
Xk with the other columns of the X matrix. The maximum curvature is given by 

2 



a 






Ikir' + ^ir'k 



lk\r + /3irfc 



2 ^02 , -2 ll^l 



■y, — 



/5i^ + 7.V 



(3.11) 



When the partial correlation between Xi and X}. is zero, Cmax achieves its minimum 
value of ^, yielding Vmax ex: r^. In the event that Xk is a linear combination of the 
other regressors with non-zero 7^1, Cmax = 00 and Umax ex: r. Between these two 
extemes, the direction of maximum curvature is a weighted average of the residuals 
and the adjusted k*^ regressor. These quantites are used to detect outliers and 
measure leverage, respectively. If the partial correlation between Xi and X^ is as 
large as that between Xi and y, then the two vectors are equally weighted. Note 
that this weighting is invariant to scale changes in the columns of the design matrix 
as well as the response. 
3.2.4.1 Directed influence on 

Without loss of generality, we again consider perturbing the first column. 
The rank-A; acceleration matrix is 



F^' = — 



VlFilr/ Irilr 



(3.12) 



The first eigenvalue of this matrix is unique, and yields a maximum curvature and 
corresponding direction of 

Cmax = J2 [$i + ^) and viocr- ^^r^ . 



Hence, the maximum curvature and its direction are identical to that of directed 
influence on /5i. Confirmation of this first eigensolution follows: 

-(A'lkiir + ||r|ni(r--/3iri) 

= ||r||V - AlkllVi - ^f llnll^if n + /32||ri||V 

-/5il|n||V-||r||V + /3i^||ri||Vx+A||r-||Vi 
= -^i^||ri|plfri+/5f||ri||Vi ,, 
= 0. 

The second eigenvalue is ^/5f and has multiplicity A; — 1. As mentioned earlier, 
this implies that there is no unique set of orthogonal eigenvectors to complete the 
solution. 

3.2.5 Influence on 

The acceleration matrix for examining both ^ and a^ simulatenously has 
k + 1 = p non-zero eigenvalues. The form of the eigensolution is unknown. My 
numerical studies indicate that there are three unique non-zero eigenvalues, one of 
which has multiplicity k - I. The one with higher multiplicity appears to be the 
second largest of the three, with a value of ^^l- 

Example 

Huynh (1982) and Fung and Kwan (1997) examined data from 20 randomly 
sampled schools from the Mid-Atlantic and New England States. The mean verbal 
test score (VSCORE) of each school's 6th graders is regressed on staflP salaries per 
pupil (SALARY), % of white-collar fathers (WCOLR), a socio-economic composite 
score (SES), mean teacher's verbal score (TSCORE), and mean mother's educational 



56 
Table 3.1: School data; Pearson correlation coefficients 





SALARY 


WCOLR 


SES 


TSCORE 


MOMED 


VSCORE 


SALARY 


1.00 












WCOLR 


0.18 


1.00 










SES 


0.22 


0.82 


1.00 








TSCORE 


0.50 


0.05 


0.18 


1.00 






MOMED 


0.19 


0.92 


0.81 


0.12 


1.00 




VSCORE 


0.19 


0.75 


0.92 


0.33 


0.73 


1.00 



level (MOMED). Appendix A contains model fitting information and diagnostics for 
the regession analysis, with all variables centered and scaled. 

Diagnostic plots of the residuals, externally studentized residuals, Cook's 
Distances, diagonal elements of the hat matrix, and DEBET AS appear in Figure 
3.1. Ordinary least squares regression indicates cases 3 and 18 are outliers, although 
Huynh (1982) also identified observation 11 as a mild outlier. This data set also 
has some redundancy among the predictors SES, WCOLR and MOMED (Table 
3.1). Although the X matrix is not ill-conditioned, the association between SES 
and MOMED accounts for the unexpected negative slope estimate for MOMED. 

Table 3.2 shows the non-zero curvatures (eigenvalues) for perturbing 
single columns of the X matrix. Comparing the columns of the table provides 
information on which regressor's measurement is most influential. Here we see 
that the measurement of SES is typically the most influential, both for regression 
parameter estimates and for a"^. Comparing the rows provides information on which 
parameter estimates are most influenced by these kinds of perturbations. Here we 
see that WCOLR and MOMED estimates are the most sensitive. These are the two 
regressors that are highly correlated with each other and with SES. It comes as no 
surprise that the results for collinear explanatory variables are more susceptable to 
measurement errors of this nature. 



57 









• 


« 








M 








o.:s 






• 

• • 




• 


• 
• 


• 




. .i' 


• 10 


• 15 20 


■u 






• 

• 
• 


■o.s 


t 







• • 5 • 10 'IS 20 



.A-M -*-■-■■ 



(a) Residuals 



(b) Externally stu- 
dentized residuals 



(c) Cook's Dis- 
tcinces 



t.t 

1.3 

t,i 

0.1 



.A. ■ 



1.1 



11 IS :o 



* 

-.,,^ 



' ' ■ I ' '"« ' ' t. .'ft 



t ■ ■ • — • — »-» — * * ■ ■ — "-^ 



(d) Hat diagonals 



(e) DFBETAS for 
SALARY 



(f) DFBETAS for 
WCOLR 



1.S 

1 



i U 15 'JO 



-r* — ■ I ■ ■ » 



VT 



* — I ' > . * — * — « » ■ ■ I 



(g) DFBETAS for 
SES 



(h) DFBETAS 
for TSCORE 



(i) DFBETAS for 
MOMED 



Figure 3.1: School data; diagnostic plots 



58 



Table 3.2: School data; curvatures using Xi + u perturbation 

Estimates Column being perturbed 

in LD{u}) 

ovLD^^'^iu)) SALARY WCOLR SES TSCORE MOMED 

e Ai=4.144 Ai=20.156 Ai=50.008 Ai=6.658 Ai=19.065 

: A2=0.440 A2=0.847 A2=19.002 A2=1.413 A2=00.932 

A3=0.094 A3=0.071 A3=14.441 A3=0.599 A3=00.091 

$ Ai=3.357 Ai=18.534 Ai=26.445 Ai=4.432 Ai=17.292 

A2=0.440 A2=0.847 A2=19.002 A2=1.413 A2=00.932 



A 


3.357 


0.956 


19.008 


2.126 


0.951 


A 


0.458 


18.534 


19.890 


1.585 


10.869 


A 


0.443 


2.957 


26.445 


1.501 


1.640 


?A 


1.129 


1.859 


19.220 


4.432 


1.295 


A 

A 


, 0.444 


11.590 


19.324 


1.480 


17.292 


^ 


0.881 


1.693 


38.004 


2.825 


1.865 



Since perturbing the SES variable is most influential, examining directional 
vectors for this perturbation is of interest. The element corresponding to case 18 is 
the most noticeable in the plot of Vmax for the likelihood displacement (Figure 3.2). 
This observation has a large residual and is indicated as being influential in the 
various diagnostic plots in Figure 3.1. Also note the similarity of Figure 3.2 to the 
residuals (Figure 3.1(a)). 

Plots (a)-(e) of Figure 3.3 show the directions of maximum curvature when 
directing influence on each of the individual regression parameter estimates. In 
general the plots indicate diff'erent elements of X3 to be influential on different $i. 
Finally, Figure 3.3(f) provides the direction of maximum curvature when directing 
influence on a"^. Recall that this vector is proportional to the residuals. 



59 



1 
0.75 












• 


0.5 




r 










0.25 


f 


• 


t • 




• 


• 
• 


0.25 


• 


• 5 • • 


• 


• 


'ft 


. 2° 


•0.5 




• 










0.75 














■1 















Figure 3.2: School data; X3 + u; perturbation; influence on 

3.3 Perturbations to a Single Row 
The special case of only perturbing a single row of X was discussed in 
Thomas and Cook (1989) in generalized linear models. Here, detailed derivations 
for regression are presented. The results indicate that single row perturbations have 
limited diagnostic value. - . 



3.3.1 Building Blocks 

Without loss of generality, suppose perturbations are added to the k regressor 
values of the first observation (i.e., x\ becomes x\ + a;'). For di being of order n, 
the perturbed likelihood is 



K[e- y) oc -^ loga^ - ^(2/ - X/3 - d,u:'P)'{y - X/3 - d.JH) 

k k 

= -^loga^ - -L(y'y - 2y,(^^,a;,) + (^^,a;,f). 



(3.13) 



j=i 



j=i 



Partial derivatives with respect to w are as follows: 

^^ = -^(-2,.,.2(E/....)/3) 



(3.14) 



<•■ >.>«s 



60 



1 

0.75 

0.5 

0.25 



■0.25 

■0.5 

■0.75 

■1 



< . « 



5 *10 * A 20 



« • 



(a) Vmax for /3l 



1 












0.75 












0.5 








1 




0.25 


• 




••••.,• 




', •• 








5 10 




15 • •20 


0.25 
-0.5 




• 


• . 


• 


• 


0.75 












•1 













(b) Umax for /S2 



1 

0.75 

0.5 

0.25 



•0.25 

■0.5 

■0.75 

-1 



• • • • 



. 'S. • V . " • 



1 

0.75 

0.5 

0.25 



(c) Umax for ^3 



■0.25 

-0.5 

-0.75 

■1 



• • 



• * 5 . • 10 i • ,A 



(d) Vmax for Pi 



1 








1 






0.75 








0,75 




« 


0.5 




I 




0.5 






0.25 


• .*•• • .••. 


t 


• 


0.25 


• •/ '. 


• 




5 , V 15 




V 


• • 5 ' • 10 


,'15 20 


-0.25 


• 
• 






■0.25 


• 


t • 


•0.5 


• • 






■0.5 


I 




••.75 








■0.75 






4 


(e) t^max for /Ss 






•1 


(f) t^max for 


a^ 



Figure 3.3: School data; X^ + w perturbation; directed influence plots 



61 



Since the null perturbation is 0, evaluating at Wq simplifies the expression. We then 
proceed to take second partial derivatives: 

/ 



d^L^{e;y) 



duid^i 



UJ=U}o 



LV 



^ 









2/1 - x[l3 


-xu^ 







1 





d'L^{e;y) 



duda^ 



2/1 - x[^ 



a^ 



^. 



u;=u;o 
Evaluating at = and combining the portions for the regression parameter 

estimates we obtain . . 



d^L^{0;y) 



dud^' 



1 



ril - 0x[ 



dujda'^ 



ri 



i^' 



0=0,U=LJo ^ 

where ri is the first residual. From these quantities we can construct the A; x p 
matrix 



A' 






rj-^x[ -^0 



(3.15) 



The acceleration matrix is then given by 



F = 2 — 



a" [XX'Y^ 
0' 



^n + ^ h3/9 + r\{X'XY^ - ri^x[{X'X)-' - n{X'X)-'xi^' 



2£l 

n 



ril -xi$ 



62 



3.3.2 Directed Influence on o^ 



The acceleration matrix for the profile likelihood displacement for a is given 



by 



-S3 



;;.4n 



2(7 



n 



-S3' 



\r\ 



no^ 



(3.16) 



/9/9 



Solving the characteristic equations (F — \I)v directly, the single eigensolution is 



4rf 



^max = S||/3|| 



Un 



OC^. 



Notice how the curvature depends upon the residual for the case that is being 
perturbed, but the directional vector is always proportional to /9 regardless of which 
row of X is being perturbed. This suggests that Umax is not very useful for single 
row perturbations. The values of Cmax for each row can be compared to assess which 
row is most influential when perturbed. However, the relative effects only depend 
upon the residuals, indicating there is little value added by plotting the curvatures. 



63 



3.3.3 Directed Influence on /3j 

Without loss of generality, we can direct influence on 0^: 



Umax oc A'T 



oc 



ri Jfe - 0x[ 




1 

-7i 

Xn0 



O'k-i 




- xnl3 - ri 

Ok-i / \ 7i 



n \ j + {xu - xn)0 

-7i 



rilk-i 
\ 



^-i.) 



1 

-7i 



+ :rii/3 



/ 



and 



""'"-T^llnl 



[1 -7l)' + (iu-a:n)3|r, 



where Xu is the predicted value of Xu from modeling Xi as a linear combination 
of the other regressors. More generally, the maximum local influence on $j by 
perturbing row i is -' 



'^max — fi 



^max — /^ > 



I \^ij XijIP 



-yj 






(1 -V.)'+(£,^.-a:,,.)^||2. 



If the columns of X have all been standardized, then each of the elements of 7, and 
of /3 is less than 1. This suggests that element i of Vmax is likely to be larger than 
the other elements. However, the signs of r^ and {xij — Xij)$j can work to cancel 
each other out. Further interpretation is suppressed in lieu of the more general 



64 



Cnax 

35 
30 
25 
2«f 

15 
10 
Si 



• • • 



10 



15 



20 



Row 



Figure 3.4: School data; maximum curvatures for single row perturbations 



perturbation presented in Section 3.4. Note that the curvature and directional 
vector change if one or more of the columns of X are rescaled. 

3.3.4 Influence on fi and 6 

The acceleration matrix for directed influence on ;9 is 

F^^^ = ^ [hiM' + rliX'X)-' - n0x[{X'X)-' - r,iX'X)-'xi0'j . (3.17) 

The eigensystem of this matrix is unknown, but my numerical examples indicate 
that it has k distinct non-zero eigenvalues. The eigensystem of F for the full set of 
parameter estimates is also unknown. 

3.3.5 Example 

The values of school 18's regressors are most influential when perturbing rows 
of X one at a time, as evidenced by a large maximum curvature (Figure 3.4). For 
perturbing row 18, v^^x is nearly proportional to ^ (Figure 3.5). Note that although 
element 3 of Vmax appears quite large, it is actually less than the benchmark of 
— = — — 8Q4 



65 



Ir 

0.75 

0.5 

0.25 

(I 
-0.25 

-0.5 

■0.75 

■1 



Figure 3.5: School data; x[g + w', Umax for 6 



3.4 Perturbations to the Entire Design Matrix 
At this point, the general case of perturbing all nk elements of the design 
matrix is considered. We use W to denote the perturbations in matrix form: 



W 



Ui 


Wn+1 


^2n+l ■ 


• • l^kn-n+1 


a;2 


'*^n-f2 


... 


■ ■ ^kn-n+2 


0^3 




... 


■ ■ l^kn-n+3 


Wn 


l^2n 


i^Sn 


■■ OJkn 



and LOj to denote the f^ column of W. • . 

3.4.1 Building Blocks 

The derivations in this section only require the presence of an intercept in 
the model (or centering of the response y). However, for ease of interpretation, we 
assume the columns of X have been centered and scaled. The perturbed likelihood 



is 



L^iO; y)^-^\oga'-^{y-X0- Wl3y{y -X0- W/3). (3.18) 



66 



Proceeding in a similar fashion to the derivations for perturbing a single column, 
partial derivatives with respect to u; are as follows: 



dLUff;y) _ 1 

du: <t2 



/3,{y-X0)-P^W0 



f3k{y-X^)-/3kW^ 
Since the null perturbation is 0, evaluating at Wq simplifies the expression: 



dK{0-y) 



d(Aj 



U}=(jiJo 



I3i{y-X0) 
^2(2/ - X0) 

Pk{y-X0) 



We then proceed to take second partial derivatives and evaluate at 0: 



(3.19) 



(3.20) 



d'LU9;y) 



dujd/3' 



1 

0=0o,(j>J=U}o ^^ 



1 



/ 



LV 



r ... 

Or ... 

... r 



\ 



^2X 



\kx ) 



— {Ik®r-l3®X) 



d'K{0;y) 



d(jjda'^ 



1 

0=0o,U}=U}o ^'^ 



02r 



a' 



■(/9®r). 



67 



From these quantities we can construct the np x p matrix A', as given by Cook 
(1986): 



A' = ^ 



a^ 



Ik®r-P®X 



U^®r) 



(3.21) 



where ® denotes the Kronecker product (Searle 1982). 

The next step is to construct the acceleration matrix. Here we use the fact 
that {A ® B){C ® D) = AC ® BD, assuming comformabiHty of the matrices. 
Noting that A = A®\ = 1®A, the acceleration matrix can be written as 



F = — (jfc ® r - ^ ® X) {X'Xy^ (ik ® r' - /3' ® X') + -^ (^^' ® rr'^ 
= ^(^Ik®r-$®X^ ((X'X)-i ®r'-0'® (X'X)-^X') 
(^^'®rr'y ; 

(3.22) 



na^ 



3.4.2 Directed Influence on q-^ 

The acceleration matrix for the profile likelihood displacement for &^ is the 
second term of (3.22): 






(3.23) 



This matrix has a single non-zero eigenvalue, ^||/3|p, with corresponding 
eigenvector oc ®r. This solution is confirmed below: 



{f"'^ - XI)v a 



i-M'®rr'-^\m'l 



na 



a'- 



i0®r) 



na^ 



na^ 



^0 ®rr'-\m''\\rfl\00r) 
\m'0®\\r\\'r-0f\\r\\'0®r) 



(3.24) 



= 0. 



68 

This result implies maximum local influence on a^ is achieved by perturbations in 
each column that are proportional to the residuals scaled by the corresponding ^j. 

3.4.3 Directed Influence on /3 

In this section, we derive the full eigensystem of F , where 

•• Iff] 

The following theorem provides the full eigensystem of F for the X + W 

perturbation in a regression analysis. Note that eigenvalues in the theorem appeared 

in Cook (1986), but the eigenvectors appear here for the first time. 

Theorem 3.1 

Recall that (pj is the j*'^ eigenvector of X'X and that X(pj is the f^ principal 

component. Letting Sj denote the j'^^ eigenvalue of X'X, the k non-zero eigenvalues 

and eigenvectors of F are 






\ = 2(t + ^) v^ ex ^fc_,.+i ® r - y3 X<p,_^+i, (3.25) 

Ofc-i-l-l o 



for j = l,...,k. 

Proof of Theorem 3.1 " ^ >", i^. 

Because the algebraic expressions are so long, the eigensolutions are confirmed in 

two steps. Noting that j-^ — and Vfc-j+i ^^"^ the /'* eigensolution of (X'X)~\ we 



69 



have 



2 r 



^ [(jfe ® r - ;9 ® X) ((X'X)-i ® r' - /3' O (X'Xj-^X')] (v?,. r) 






= ^[(/.®r-;3®x)(tE!<^,._o)]-A(H! + ||^||2)(^.^^) 



(3.26) 



2 riM'2 



^U^lV'.^r-^^X-^,.) 



2 Jlr 



C/ t/'j 



■Pir(¥',-®r)-^(;9®X<^,.) 



The following matrix multipUcation yields the same expression as (3.26): 



F^^^-2(^ + *)/j(/3^X^,) 
= 4 [(jfe ® r - ^3 ® X) ((X'X)-i r' - 3' ® (X'X)-^X')] (3 O Xip^) 



= ^[(lfe®r--^®x)(0-||/3||Vj) 
2 






2 



ll/3|r(-V,-®r + (/3®X<^^.) 



^(JJ^ + ||^ir)(^®Xv.,) 



-||y3|p(VP,.®r)-\^(3®X<^,.) 



(3.27) 



Using (3.26) and (3.27), confirmation of the /" eigensolution is immediate: 

Ofc-j+1 cr 



(3.28) 



= 0. 



Finally, for j / j*, the orthogonality of the solutions follows: 

{(Pj®r-^®Xipj)'{ipj,®r-^^X(fij,) 

= 'p'j'Pj* ® r'r - 3V;* ® 'PjX'r - cp'.0 ® r'X<p^, + 0^ (8> if'jX'Xipj, 

= 0-0-0 + 3'/9® Vj-Vj* 
= 0. D 



3.4.3.1 Interpreting Theorem 3.1 

The closed- form eigesolution in Theorem 3.1 gives insight into how slight 
changes to the observed explanatory variables affect inference. To interpret the 
theorem, consider maximum curvature, as per (3.25): 

C„.ax = 2(^ + H!). (3.29) 

Ok (^^ 

The eigenvalues of X'X are traditionally used to assess the amount of collinearity 
in the regressors. If 5k is small, there is redundancy in the explanatory variables, 
and this causes the inversion of X'X to be unstable. If this inversion is unstable, 
so too are the regression parameter estimates, $ = {X'X)~^X'y. The first term in 
Cmax is large when 6k is small, giving further evidence that the estimates are more 
sensitive when collinearity is present. The second term of Cmax is also interesting. 
It can be considered a naive measure of goodness of fit; when the relationships 
between the explanatory variables and y are strong, ||)9|p is large and a^ is small. 
Hence, "better fitting" models are more sensitive to perturbation. However, ||)9|p/6-^ 
typically grows whenever additional variables are added to the model, indicating 
more complicated models are also more sensitive to pertubation. In fact, ||/3|p/(7^ 
bears some similarity to R"^: 



SSy y'y n((T2 + ||X^||2) 



71 

The quantity B? is the proportion of variance explained by the model, and serves 
as a measure of goodness-of-fit. However, B? can be made artifically large by 
adding additional explanatory variables, even when they contribute very little to the 
explanatory value of the model. The quantity ||/9||^/(7^ behaves similarly: additional 
variables decrease the denominator and usually increase the numerator. Of course, 
the scale of the regressors is a major factor in the size of \\0\\^, indicating that 
scaling the regressors would be prudent for interpretation. 

So, we may interpret Cmax in the following way. The vector of parameter 
estimates is more sensitive when (a) there are collinear variables, (b) there are a large 
number of explanatory variables, and (c) the model closely fits the data. We are not 
surprised to see that the first two phenomenon are sources of sensitivity. However, 
the suggestion that better-fitting models are more sensitive to measurement error 
may at first be disconcerting. When viewed in light of Type I and Type II errors, 
this is not the case. Recall that a Type I error occurs when a test is wrongly 
declared significant, while a Type II error occurs when a test is wrongly declared 
insignificant. Generally, a Type I error is considered the worse of the two. So, if 
there are strong relationships between X and y, the formula for Cmax implies this 
might be missed because of measurement error in the regressors. However, if there 
is a weak relationship, perturbations to X would not make the relationship appear 
strong. In this sense, the regression model is better protected against Type I errors 
(concluding there are strong relationships when there are none) than Type II errors 
(concluding there are weak relationships when in fact there are strong ones) when 
there is measurment error in the regressors. 

As a final comment on Cmax, the term W^W^/a^ may be better described as 
an indicator of a closely fitting model rather than a well fitting model. This suggests 
that if we construct complicated models to fit the nuances of a data set, the resulting 



72 

parameter estimates are more sensitive to rounding and measurement errors in the 
explanatory variables. 

Now let us consider Umax as per Theorem 3.1: 

Umax OC V'fc^r-^^OXyJfc. (3.30) 

•• r/?i 
This vector and the other eigenvectors of F provide detailed information about 

which elements of X most affect the stability of the MLEs. The expression for 

Umax shows how outUcrs and leverage in the final principal component regressor 

determine precisely what kinds of mismeasurements have large local influence on $. 

The directional vector can be used to scale W to achieve maximum local influence: 



yy max OC 



ifkir - ^iX(pk ipk2r - $2X(pk "■ ... (pkkV - PkXipk 



. (3.31) 



The elements in the first column are a weighted average of the residuals (r) and the 
regressor values of the last principal component regressor (Xy>^). The respective 
weights are the first original regressor's contribution to the final principal component 
{(Pki) and the negative of the first regression parameter estimate {—$i). 

Alternatively, we can think of the elements in the first row as being a 
weighted average of the explanatory variables' contributions to the final principal 
component {(p'^) and the negative of the parameter estimates {0 ). The respective 
weights are the first residual (ri) and the negative of the first observation's value for 
the final principal component {—x[(p^.). 
3.4.3.2 A special case 

In order to examine in more detail how the perturbations given in (3.31) 
affect $, we consider a simple situation for which there are closed- forms for the 
perturbed parameter estimates. Let us assume that the response and a pair of 
regressors have been centered and scaled to have length one, and that the regressors 



73 



are orthogonal. The implications of these simplifications are as follows: 



y = 


\\Xi = X2I = 1 


r'r = 


l-/5i^- 


-/?! 


A = 


x'lV 


$2 = 


x'^y 




5i = 


5r = l 


(p = 


I2 





The first two eigenvalues of F are the same: 






and we may choose either of the following vectors for Vmax^ 



r - AXi 
-p2Xi 



-^1X2 
r - /^2X2 



Without loss of generality we choose the first one. Letting X^^ denote the perturbed 
design matrix, we have 



X^ = X+aW 



= X + a 



r - /3iXi -/92X1 



(3.32) 



ar + (1 - aPi)Xi X2 - a$2Xi 
Xi^u -^2,0; 

for some real number a. Now, consider the effects of perturbation on /?i. Assuming 
a no-intercept model is still fit after perturbation, the perturbed estimate is given by 

r'i,uy 



Pl,.= 



\ri,u 



(3.33) 



74 



where ri,a, is the residuals from fitting Xi^^ as a function of X2,a;- The building 
blocks needed are as follows: 



J^2,(j — -^2,w(-X^2,t.;-^2,w) ^2,, 



1 



(X2-a/32Xi)(X'2-a/32Xl) 



1 + a2/9| 
ri,^ = (/ - H2,<^)Xi,^ = (1 - a/3i - acp2)Xi + 0X2 + ar 

r[,^y = (1 - a/^i - ac$2)Pi + 0^2 + a(l - $1 - $1) 
|ri,J|2 = (1 - aA - ac/52)' + c^ + a'{l - Pf - 4^), 



where c = "^f't'l It follows that 

A ^ (1 - aA - QC;92)/gi + c^2 + q(1 - /3f - /gp 
^'■" (1 - a$, - ac/92)2 + c2 + a2(l - pf - /5|) ' 

For the sake of comparison, we can also consider perturbing only the first 
column. Perturbation in the direction of Umax when directing effects on 0i results in 
the following perturbed X matrix: 



X^ = X + a*W* 



= X + a* 



r - /5iXi 



(3.34) 



a*r + {l-a*Pi)Xi X2 
as per (3.8). Here, the perturbed estimate is given by 

>:..: ^ ^ (l-a*A)/?i+a*(l-/^?-4') 

(l-a*A)2 + a*2(l-/52_42)- 

In order for the two perturbations to be comparable, they should be equal in size. 
This is accomplished by stacking the columns of the perturbation matrices and 
making the resulting vectors equal length. In other words, the Froebenius norms 



of aW and a*W* should be equal. Letting a* 
adjustment. 



V^ 



provides the necessary 



Now we can plot ^x^^ for some selected values of /Sj and $2, considering values 
of a ranging from, say, — V^ to \/2. This eliminates perturbation matrices with 
Froebenius norms that are larger than that of the unperturbed X matrix. Figure 3.6 
shows plots of ^i^ij for four sets of parameter estimates. The solid lines represents the 
parameter estimate in the direction Umax for the two-column perturbation, while the 
dashed line represents parameter estimates in the direction Vmax for the one-column 
perturbation. All of the influence curves in plots (a) and (b) have a value of .2 under 
the null perturbation, i.e. when a = 0. Likewise, the curves in plots (c) and (d) are 
all equal to .5 at the null perturbation. In all plots we see that the single column 
perturbation does not have the same effect as the dual column perturbation. The 
discrepancy between their effects is larger when /?2 is larger (plots (b) and (d)). 

3.4.4 Influence in Principal Components Regression 

Consider the canonical form of the model, as given in Chapter 1: 

Y = Zoi + e, (3.35) 

where Z = X(p and a = (fi'/3. The following theorem proves that the direction of 
maximum curvature is the direction of maximumum directed influnce on Qjt. 
Theorem 3.2 

Let a. denote the MLEs of oc for model (3.3b). The pairs Xj, Vj given in Theorem 
3.1 are the maximum curvature and direction of maximum curvature for directed 
influence on cik-j+i, j = 1, . . . , fc. 



76 








Bl 








1.5 






*;: 




1 


^ 


y^ '-~--,^^^ 


i 




0.5 


/*' 


"■■--.. 




•1 


..-■^■y 


0.5 


1 




^ 


-0.5 





(a) Influence curves for /3i 
/32 = .4 



— .2, (b) Influence curves for $\ 

P2 = .l 



= •2, 





(c) Influence curves for Pi 
02 = A 



= .5, (d) Influence curves for Pi 

02 = .7 



= .5, 



Figure 3.6: Effects of single and double column perturbations on ^i. Solid line- 
double column; dashed line- column 1 only. 



77 
Proof of Theorem 3.2 We begin by obtaining update formulas for A and L as 



per Section 2.7: 




■; 


'-' a'=a/^ 


,- ■ ■■"?>••■ 


V. ■ .:- =A>^ 
















= (^l^\X'X)<p) 








^a''D{l/6). 



(3.36) 



(3.37) 



78 



Next, we find maximum directed influence on dfc-j+i using the results of Beckman 
et al. (1987): 



Vmax OC KTa,^j+i 



oc 



(fi<S)r — (S) X(p 





& 

\/*fc-j+i 





oc (fik-j+i ®r-P® Xipk^j^i 



Cmax — 2||AQTa^__^._^J 



^. (vLi+1 ® r' - ^' (g) v5'fc_,.+iX')(<^fc_,+i O r - 3 ® Xcp^_^^,) 

-^r^— (1 ® r'r - \m' ^',_.^,X'X^,_^^,) 



^=^<^fc- 



j+i 



=2(^.m 



4-j 



/+i 



Since these expressions are the j*'' eigensolution of F \ the proof is complete. D 

This theorem gives added insight into the direction of maximum curvature. 
In short, v^^ is the direction that changes ay, the most. Similarly, the /'' largest 
eigenvalue and associated eigenvector specifically correspond to eflfects on afc-i+i- 
Also, the sizes of the curvatures show that the estimated coefficients for the most 
important principal component regressors are also the least sensitive. 



79 

Example 1 

Consider the Scottish hills data set described in Chapter 1. The purpose of 
this example is twofold to demonstrate how the X-matrix results can be displayed 
graphically, and to demonstrate how collinearity affects the curvature values when 
examining influence on $. 

Figure 3.7 gives the direction of maximum curvature for directed influence on 
a^. The first 35 elements correspond to perturbations of the DISTANCE regressor 
values, while elements 36-70 correspond to perturbations of the CLIMB regressor 
values. The value for race 18's DISTANCE is the largest contributer to this 
influential perturbation. Elements 7 (DISTANCE for race 7) and 53 (CLIMB for 
race 18) are also large 

The information in Figure 3.7 can be difllicult to discern because the 
perturbation vector enters into the likelihood in the form of a matrix, W. Another 
way to display Umax is to first take its absolute value and then to consider the 
elements corresponding to each column of W. Each portion can then be plotted 
against indices 1, . . . ,n and shown together (Figure 3.8(a)). This graphical display 
provides a "profile plot" of each column of W's contribution to the direction of 
maximum curvature. Here we see again that elements corresponding to observations 
18 and 7 as being the most influential. There is also some evidence that measurement 
error in the DISTANCE values (solid line) are more influential than error in the 
CLIMB values (dashed line). 

Figures 3.8(b) and 3.8(b) provide profile plots of the absolute value of the 
two directional vectors corresponding to Vi and V2 of F . The first plot draws 
attention to observations 7, 11, and 18, while no observations appear noteworthy in 
the second plot. 

At this point we turn our attention to the curvature values that accompany 
these directional vectors (Table 3.3). The first column shows the curvatures for local 



80 

Table 3.3: Hills data; curvatures using X + W perturbation. Model I contains 
DISTANCE and CLIMB. Model II contains DISTANCE and CLIMB-RATE 



Estimates in LD{u:) 

or LD^^'^iv) Model I Model II 

d^ Ai = 30.47 Ai = 28.52 



Ai=21.15 Ai=16.41 
A2=16.49 A2=16.25 



influence when fitting the regression model for winning time as a linear combination 
of DISTANCE and CLIMB. However, as was evident in Figure 1.5(c), the two 
regressors are linearly related. An alternative model can be fit by replacing climb in 
feet with climb in feet per distance travelled, CLIMB-RATE . This regressor does 
not have the strong positive association with distance that the original coding of 
CLIMB has. This should reduce the coUinearity in the model, thereby reducing 
sensitivity to perturbations in the X matrix. 

Using standardized regressors, the new model is fit. Column 2 of Table 3.3 
shows the curvature values for this new model. Cmax for directed influence on ^ 
has been substantially reduced, indicating that the MLEs for the second model are 
indeed less sensitive to perturbations in the X matrix. 

Figures 3.9(a), (b) and (c) provide the profile plots of the absolute value 
of the directional vectors that accompany the curvatures in Table 3.3, Column II. 
All three plots tend to indicate DISTANCE values as being more influential than 
CLIMB-RATE values, and all three plots show substantial change from those in 
Figure 3.8. The plot for a"^ now indicates the DISTANCE value for observations 7 
and 18 as influential, rather than just 18. No elements stand out in Vmax, while case 
11 's DISTANCE value is a very substantial contributer to t;2- 



1 

0.75 

0.5 

0.25 



■0.25 

-0.5 

•0.75 

-1 



* * *■' * 



-r* — • — ' — ' — ' — •— * — ' — ' ■•' — ' • ■ ; •'•■•• '^ » ' — ' — t — ■ * ' ■ ; ' *« * , ■ » ' • ' — 

10 . . • 20 • .'30 . •4-0 -5^ • 69 *. TO 



■..1,-* 



Figure 3.7: Hills data X + W pert.; Vmax for a^ 



81 



Example 2 

We now consider applying the X + W perturbation to the school data and 
examining directed influence on 0. The purpose of this example is to demonstrate 
alternative methods for displaying Umax when the number of regressors and/or 
observations is large. 

Figure 3.10 is the plot of Umax for directed influence on 0. Elements 1-20 
correspond to SALARY, elements 21-40 correspond to WCOLR, elements 41-60 
correspond to SES, elements 61-80 correspond to TSCORE, and elements 81-100 
correspond to MOMED. The two regressors that have little association with other 
explanatory variables, SALARY and TSCORE, have small elements. In contrast, 
some of the elements for the other regressors are more substantial. This plot does a 
fair job of highlighting regressors, but a poor job of highlighting observations. 

Figure 3.11 is the profile plot of |umax|- It is easy to see that cases 3 and 18 
have multiple regressor values that are influential. In addition, one of the values for 
case 2 is influential. However, is to difficult to keep track of which line represents 
which column. Another option is to plot "profiles" of each row, as in Figure 3.12. 
Here, some of the elements of observations 2, 3, and 18 are highlighted. 



82 







(a) l^maxl for a^ 




1 

0.75 

0.5 

0.25 

-0.25 

•0.5 

■0.75 

-1 




5 10 15 20 25 30 35 



(b) It^maxl for /9 



(c) \V2\ for ^ 



Figure 3.8: Hills data; X ^W pert.; column profiles of directional vectors. DIS- 
TANCE values are connected by solid line. CLIMB values are connected by a dashed 
line. 



83 




(a) It^maxl for u 



1 

0.75 

0.5 

0.25 

■0.25 
-0.5 

-0.75( 
•1 




5 10 15 20 25 30 35 




(b) l^maxl for ;9 



(c) \v2\i0xh 



Figure 3.9: Hills data; Model II; X + IF pert.; column profiles for directional vectors. 
DISTANCE values are connected by solid line. CLIMB-RATE values are connected 
by a dashed line. 



84 



0.4 



0.2 



. •• • 20 ' 



40. •.60'^ '80 . • 100 



0.2 



■0.4 



Figure 3.10: School data; X + W pert.; Umax for directed influence on $ 




Figure 3.11: School data; X+W pert.; Column profiles of |umax| for directed influence 
on 0. Thick solid line- SALARY, thin solid line- WCOLR, dot-dashed line- SES, 
small-dashed line- TSCORE, large-dashed line- MOMED. 



85 











0.4 


18 a 




18 . 


0.3 


rK 


\ ^A 


/ 








/ y 


0.2 






// 


0.1 






\////^ 




^^^^ 




^^m 



2 



3 



Figure 3.12: School data; X + W pert.; row profiles of \vj_ 
on ^ with large elements labeled. 



for directed influence 



A bubble plot of | WmaxI provides the best graphical display of the information 
in I'max (Figure 3.13). Each element is represented by a point plotted according to 
both the corresponding regressor (x-axis) and case (y-axis). Around each point is a 
bubble with radius equal to the absolute value of the element, and so large bubbles 
indicate large contributors to the direction of maximum curvature. Again, we see 
that the elements corresponding to Xi8,2, a;i8,5, 0:3,2, 2:3,5, and 0:2,3 have the largest 
influence, as evidenced by having the largest bubbles. 



86 





Row ,, .J . . 






20 


. '^ . ■ ® 


• 


i^v^*- 


* '. 





■*■ ^• 


*.-u 


'i- 


. o« ■ 


> o 






® 


e 






® 


• 




15 




• ■ • 









. 








• ^ 








• « 







10 


• • - 

• • • 

• • 

© 






5 


® 
© 


® 






• o • 


o 






• ® O ' 


) ® 






© 


• 






12 3^ 


1 5 



Column 



Figure 3.13: School data; X + W pert.; bubble plot of l^maxl for ^ 



CHAPTER 4 
LOCAL ASSESSMENT FOR OPTIMAL ESTIMATING FUNCTIONS 

It is possible to examine the local influence of perturbations on measures other 
than the likelihood displacement, LD{u}), and the profile likelihood displacement, 
I,Z)[^i](a;). Several authors have applied local influence techniques to specific 
measures of influence. For instance, Lawrance (1988) and Wu and Luo (1993c) 
looked at influence on the Box-Cox transformation parameter. Also, influence 
has been examined on the residual sums of squares in regression (Wu and Luo 
1993b), deviance residuals and confidence regions in GLMs (Thomas and Cook 1990; 
Thomas 1990), and goodness-of-link test statistics in GLMs (Lee and Zhao 1997). 
However, considerably less attention has been given to assessing local influence 
when estimation is not performed by ML. Some work has been done on extentions 
to Bayesian analysis (McCuUoch 1989; Lavine 1992; Pan et al. 1996), and Lp norm 
estimation (Schall and Gonin 1991). 

In this chapter, local influence techniques are extended in two ways. First, 
we consider the influence of perturbations on a general class of influence measures. 
Second, we address parameter estimates that are found via estimating equations. 
The intent is to unify nearly all previous work on this methodology and provide a 
conceptual basis and computational tools for future applications. The first three 
sections outline the inferential platform, a general class of influence measures, 
and assumptions about the perturbations. Next, techniques for local assessment 
of perturbations are discussed. In Section 4.4, we give some algebraic results 
for the two main diagnostic quantities: the gradient vector and the acceleration 
matrix. Section 4.5 contains computational formulas, while Sections 4.6-4.8 contain 
applications. Several measures, including LD{(jj) and LD^^^^u)), are considered in 

87 



: 88 

section 4.6 using ML. Section 4.7 addresses influence on the generalized variance 
with an emphasis on ML. Finally, the last section considers quasi-likelihood. 

4.1 Inferential Platform 
This section outlines assumptions about estimation, the nature of 
perturbations, and how the effects of perturbation are measured. 

4.1.1 Estimating Functions 

Suppose that a statistical analysis is performed to estimate p parameters, 
0. Further, suppose that point estimates, 0, are obtained as solutions to a set of p 
unbiased estimating equations, 

^^' ■ ■ \ * h{e-y)=0, (4.1) 

where for each estimating function^ hi{0;y), 

■ \ '■ * Eg{hi{9;y)) = i = l,...p. 

Sometimes these estimating equations arise from minimizing or maximizing some 
criterion with respect to B. Examples include least-squares normal equations, score 
equations and equations resulting from minimizing a Bayes risk. Quasi-likelihood 
(QL) equations (Wedderburn 1974) were originally motivated by maximizing a 
quasi-likelihood function, but the existence of the objective function is not necessary. 
Generalized estimating equations (GEEs) (Liang and Zeger 1986), like QL equations, 
merely require assumptions about first and second moments. 

The theory of estimating equations (EEs) dates back to early work by V. P. 
Godambe (Godambe 1960; Godambe 1976). There has been renewed interest in the 
subject due to new estimation techniques for semi-parametric models and models for 
clustered non-normal responses (Zeger and Liang 1986; Prentice 1988; McCullagh 
and Nelder 1989; Desmond 1997a). A considerable amount of work has centered 



89 



on obtaining optimal estimating functions (OEFs) (Godambe 1976; Crowder 1986; 
Crowder 1987; Godambe and Heyde 1987; Firth 1987; Godambe and Thompson 
1989; Godambe 1991). To this end, the efficiency matrix of a set of p estimating 
functions is defined as 



Eff(/i) = (E 



dh/_ 
dG 



)-'W[h]{E 



dh 
89' 



)-\ (4.2) 



Functions that minimize this non-negative definate matrix are considered optimal. 
Roughly speaking, this index of efficiency is small when the variance of the 
estimating functions is small and the average gradient is large. Provided that the 
estimating equations arise from maximizing or minimizing some function, we may 
interpret that large gradients indicate the function is not flat around its extremum. 

Assuming that a fully parameteric form for the data is assumed, the score 
functions are the unique OEFs up to a scalar multiple (Godambe 1960; Bhapkar 
1972). However, with fewer assumptions about the data, it is generally not possible 
to find a unique set of optimal estimating functions. Instead, optimality is bestowed 
upon estimating functions within a certain class. One such class of estimating 
functions are those that are linear functions of the data. 

Under mild regularity conditions, the asymptotic variance of is in fact given 
by (4.2) (Morton 1981; McCullagh 1983; Crowder 1986). Thus, minimizing Eff(/i) 
is equivalent to minimizing the asymptotic variance of the parameter estimates. 
Note that the first interpretation of the efficiency matrix is appealing in that it is a 
finite sample index, but somewhat awkward in that it is based on a property of the 
estimating functions. The second interpretation is appealing in that it is based on a 
property of the estimates, but limited in that it requires asymptotics. 

For the purposes of applying local influence diagnostics we assume only that 
the estimating functions have been standardized. That is, given a set of unbiased 
estimating functions, h*{d\ y), we utilize the standardized estimating functions given 



90 

by 



hi0;y) = -E 



dh*' 



{W[h*])-'h\ (4.3) 



This standardization has three implications. First, \{h{0;y))~^ is equivalent 
to (4.2) and hence equivalent to the asymptotic variance of the parameter estimates. 
In addition, the standardized estimating functions are optimal in the following sense. 
Given a set of unbiased estimating functions, {/i-(0;y)}, the OEFs amongst the 
class of linear combinations of the h\ are given by (4.3). Hence, the standardized 
estimating functions are an optimal rescaling of the original estimating functions. 
Note that the negative sign is present in (4.3) so that the score equations, rather 
than the negative of the score equation, are considered standardized. Finally, 
standardized estimating functions are "information unbiased" (Lindsay 1982), 
meaning that the following equality holds: 

dh 



-E 



= E[hh'] = V[/i]. (4.4) 



d0\ 

Hence, V[^] = —E[dh/d0']. This fact is used later in the chapter to derive influence 
diagnostics for confidence ellipsoids. 

4.1.2 Perturbations 

Perturbations are again introduced as a vector of q real numbers, a;. These 
perturbations may affect the data, the model and its assumptions, or perhaps even 
just the estimation technique itself. Examples of data perturbations are minor 
changes or rounding errors in the response and the explanatory variables. An 
example of model perturbation is the variance perturbation scheme used in Chapters 
1 and 2. Finally, perturbing the contribution of each observation to the estimating 
functions by case-weights is an example of perturbing the estimation technique. 
This perturbation does not affect the distributional assumptions of the data, but 
rather how each observation is utilized for inference. Another perturbation to the 



91 

inference method would be to allow the value of the hyper-parameters in a Bayesian 
analysis to vary. Though some might argue that the hyper-parameters are part of 
the model, their values are often chosen for computational convenience. 

For the purposes of this chapter, we assume that the perturbation scheme 
results in a set of p perturbed estimating equations, 

' K{0;y)=O, (4.5) 

-•V. - " \-" V ." 

which have solutions 0^. It is also assumed that there exists a null perturbation, Wq, 
whose estimating equations are identical to the original ones. Finally, it is assumed 
that there are no constraints on cj that would limit their values in a neighborhood of 
Wo- For example, constraining the space of variance perturbations to positive values 
is allowed. However, constraining the perturbations to sum to one is not allowed. 
These kinds of constraints could be accommodated, but are not considered here. 

4.1.3 Influence Measures 

The final item to specify is a quantity whose sensitivity to perturbation is of 
interest. This influence measure serves as a barometer for how perturbation effects 
inference. The measure can depend on w directly and indirectly via some r x 1 
vector of functions of the perturbed parameter estimates, 7(^0))- Some influence 
measures, m{u},'y{0^)), may be appropriate regardless of the method of estimation 
and the perturbation, while others may be more application-specific. 

Examples of rather general objective measures are the following: 

.., . mi =x'0^ (4.6) 

m, = -l^ (4.7) 



92 

where E^{yi) = fii^. None of these three measures depend on w directly. The first is 
a linear combination of the perturbed parameters. Here, ^{0^) = x'O^ is a scalar. 
The second measure is the ratio of the i*'' unperturbed parameter estimate to the z*'' 
perturbed estimate. Again, 7(^0;) = ^i„ is a scalar. The third measure resembles a 
Pearson goodness-of-fit statistic based on the perturbed estimates. Since it is not yet 
specified how the means, /ij^, depend on the estimates, we simply write 7(^0;) — 0^. 
The following measures are more application-specific, in that they depend on 
how inference is performed and perhaps even upon the perturbation itself: 

m, = LD{uj) = 2\L{e- y) - L{9^- y)] " (4.9) 

m5 = LD[^^Hu;) = 2[L(0i,^2;?/)-m.;,^(^iJ;y)] (4.10) 

m6 = LD*(a;) = 2[L,(0,;y)-L,(^;y)] (4.11) 



ms = log 



log(^^^) 

, f{y\e^y_ 



(4.12) 
(4.13) 



The first of these measures is the likelihood displacement, for which 7(^0,) = Ouj- 
The second is the profile likelihood displacement, which has 7(^0^)' = (^iw)5(^iw)')) 
where g{6iu) maximizes the profile likelihood of Bi^^ with respect to 62. The third 
measure is a likelihood displacement with a "moving frame" because the base 
likelihood of the measure is the perturbed one. Obviously, these three measures are 
intended for use when estimation is performed by maximum likelihood. 

The influence measure rrij is a Kullback-Leibler divergence between the 
densities f{y \ 9) and f{y \ OJ). The quantity is non-negative, achieving its 
minimum value of zero when the two densities are equivalent. Here, 7(^0;) = ^w 

The last measure addresses sensitivity of asymptotic standard errors. Recall 
that the determinant of a matrix equals the product of its eigenvalues, and when 
the matrix is a variance-covariance matrix for a vector of estimates 0^ the squared 



93 

eigenvalues are proportional to the lengths of the principal axes of a confidence 
ellipsoid for 6. Hence, the determinant of the variance provides a single measure 
of the confidence ellipsoid's size. Provided the estimating functions have been 
standardized, mg is the log of the estimated generalized asymptotic variance of 0. It 
is a function of a; both directly and indirectly via 7(^w) = Out- 

As a final note, it is not necessary for the influence measure to be a function 
of point estimators. In Bayesian analyses, perturbation can be measured by 
the Kullback-Leibler distance between the perturbed and unperturbed posterior 
densities. Such a measure would depend on u;, but not on any point estimates of 
0. McCulloch (1989) developed a general local influence technique for assessing 
perturbation of the prior distribution in a Bayesian analysis. Other types of 
perturbation analysis for Bayesian models are found in Kass et al. (1989), Geisser 
(1991), Geisser (1993), and Weiss (1996). 

4.2 Local Assessment of Perturbations 
At this point we have set the stage for our sensitivity analysis: a perturbation 
is introduced into the inference, and a measure of the effects is chosen. The next 
step is to obtain some useful diagnostics. This section decribes how a Taylor series 
expansion of m and some differential geometry results can be used to describe the 
function. 

4.2.1 First-order Local Assessment 

As in Chapter 1, let ijiJ.u{a) = ujq + av he a vector in the space of 
perturbations. The unit-length vector v determines the direction from Wq and the 
value a determines the distance from a>o in fi. The value of m for a perturbation of 
length a in direction v can be approximated by a first-order Taylor-series expansion 



94 

around the null perturbation: 

m(a) « m(0) + -^(a - 0) 

= m{0)+ap-v (4.14) 

= m(0) + arh'v, 

where the partial derivatives are evaluated at a = 0. The quantity rh'v has a 
geometric interpretation. Consider the influence graph of m (i.e., the influence 
function described as a surface in 3f?'"^^). For each direction v, there is an associated 
lifted line on the influence graph. The slope of this "slice" of the graph is given by 

S = rh'v, (4.15) 

and it is sometimes called the directional derivative. The direction with the largest 
absolute slope is proportional to m, the gradient vector. 

Both the directional derivative and the gradient vector have diagnostic value. 
To explain, recall from Chapter 2 that we may use a perturbation's Euclidean 
distance from the null perturbation, ||ti; — a;o||, as a measure of its size. Further, let 
us define the effect of the perturbation as |m(o) — m(0)|. If m(0) = 0, the effect of 
a perturbation is the absolute value of m{a). The directional derivative can then 
be interpreted as the approximate eff"ect of a size-1 perturbation in direction v. 
Furthermore, suppose that we have perturbations in two different directions: u;^ 
in direction v^ and Wg in direction ug. If 1 5^ | is A; times larger than j^sl, then 
perturbation Wg must be k times larger than a>^ to have the same eflFect. Therefore, 
directional derivatives are measures of the relative influence of perturbations. 
Additionally, the gradient vector shows how perturbations can be scaled to achieve 
maximum local influence. That is, Wq ± (m/||m||) has the largest approximate 
effect, ||m|p, among perturbations of size 1. 



95 

If m(0) is a local minimum, maximum or saddlepoint of the influence graph, 
then the gradient is zero. Here, a first-order local assessment does not provide any 
diagnostic information. Instead, we can turn to a second-order approximation of the 
influence graph. 

4.2.2 Second-order Local Assessment 

A second-order Taylor series expansion of m is given by 

m(a) « m(0) + — (a - 0) + -^^(a - 0)2 

,^, dm a^ , dm^ /^ i«\ 

a^ - 
= m(0) -I- arh'v + —v'Mv, ,. 

where the derivatives are again evaluated at a = 0. The quantity A = v'Mv also 
has a geometric interpretation: it is the rate of change in the slope of the lifted line 
associated with direction v, i.e. it is the acceleration of the curve. The matrix M 
is the acceleration matrix, and the eigenvector associated with its largest absolute 
eigenvector is the direction of maximum absolute acceleration. If the gradient 
vanishes, the second-order approximation simplifies to: 

m{a) = m(0) + —v'Mv. (4.17) 

Li 

Both the acceleration and the acceleration matrix have diagnostic value, 
particulary when the gradient vanishes. In this case, A can be interpreted as the 
approximate eff'ect of a size-\/2 perturbation in direction v. In addition, suppose 
that we have perturbations in two different directions: u;^ and wg. If \Aji\ is k 
times larger than \As\, then perturbation U3s must be \/k times larger than w^ to 
have the same effect. Finally, the eigensystem of the acceleration matrix can be 
used to identify influential perturbations. The largest absolute eigenvector of M 
is the maximum approximate eff'ect of a size-v/2 perturbation, and the associated 



96 

eigenvector is the direction of maximum approximate effect. Other eigensolutions 
can also be used. 

4.3 Acceleration, Curvature and Normal Curvature 
Two more geometric descriptions of the influence measure can be calculated: 
curvatures and normal curvatures. These quantities are functions of both the 
gradient vector and the acceleration matrix. When the gradient vanishes, the 
curvature, normal curvature, and the absolute value of the acceleration associated 
with a directional vector are all equivalent. From (1.5) the curvature associated with 
direction v at the null perturbation is 

i d^m.{a) I 



c 



da'^ 



Using (4.16), it follows that 

\v'Mv\ 



(4.18) 

a=0 



c = 



(1 + v'rnrh'vYI'^ ■■■ ' , . 

(4.19) 

" (i;'(/g + mm»3/2- 

If the slope is zero, then C = |v4|. As mentioned previously, curvature is a measure 
of how quickly a curve turns. 

The notion behind normal curvature is a bit more complicated. The vector 
(u;„(a)', 0)' in W~^^ is projected onto the tangent plane of the influence graph rather 
than the graph itself. A plane is then created that contains both the projection 
as well as the vector that is orthogonal to the plane, the principal normal vector. 
This plane then intersects the surface to create a normal section. Using the tangent 
plane as the basis, curvature of the normal section can be computed as if it were 
a simple plane curve. The resulting normal curvature is a description of how the 
influence graph deviates from its tangent plane. The normal curvature associated 



97 

with direction v at the null perturbation is (Cook 1986) 

(l + ||m||2)V2t;'(J, + mm')r' , ^ " ^ 

If the gradient is zero, then NC = C = \A\. 

The two geometric descriptions NC and C can be distinquished as follows. 
Curvature is the reciprocal of the radius of the best fitting circle to the lifted line 
associated with v, while normal curvature is the reciprocal of the radius of best 
fitting circle to the normal section associated with v. If the gradient is zero, lifted 
lines are normal sections, implying that NC = C = \A\. Additional insights are 
found in Cook (1986), Wu and Luo (1993a), Wu and Luo (1993b), and Fung and 
Kwan (1997). , , ^ 

4.3.1 How to Use Local Assessment of Perturbations 

Local assessment of perturbations can be used to (1) assess the influence 
of aspects of the statistical problem on inferences, and (2) compare the relative 
sensitivity of parameter estimates. 

First, we define the local influence associated with a length-one vector v to 
be: 

{|m'v| if m 7^ 
(4.21) 
\v'Mv\ if m = 0. 

Thus, li{v) is simply the directional derivative unless the gradient vanishes, in which 
case it is the normal curvature. The direction of maximum local influence, Vmax, is 
the main diagnostic tool, and it can be displayed graphically using an index plot or 
star plot to informally assess the influence of perturbations. Additionally, the local 
influences of basic perturbations can be plotted against case indices to compare the 
effects of one- at-a-time perturbations. ; :. 



98 

The relative sensitivity of p parameter estimates can be assessed by plotting 
the p maximum local influences for rrii = Oi^, . . . ,mp = 9p^. Measures with the 
largest maximum local influences are most sensitive to the perturbation. Two 
cautions with this technique should be noted. First, the metric of the parameter 
estimates ought to be comparable. If the metrics are not comparable, standardized 
parameter estimates should be used. The second caution pertains to first- and 
second-order influence. Typically, the set of p measures are assessed by a first-order 
approximation. However, if a subset of the measures requires second-order 
assessment, then the following revised definition can be used to make the local 
influences comparable: 

■ f,'^ ■'■'*•- ■■'■ ■ I \jh'v\ if m 7^0 

li*iv) = < (4.22) 

.,; I \^v'Mv\ if m = 0. 

This ensures that all influence measures are compared by the approximate effects of 
length-one perturbations. . , 

4.4 Algebraic Expressions for Local Assessment 
In this section, algebraic expressions for local influence diagnostics are 
derived. 

4.4.1 First-order Assessment 

Theorem 4.1 

The directional derivative in direction v of influence measure m{uj, ^{9^)) at the null 

perturbation is given by: 

S = {sl + s'^KJ)v, (4.23) 



99 



where 



dm{oj,'y{e)) 



K 



d(jj 



«7 = 



dm{uj,-f{9J)) 



di{e^) 



89' 



89^ 



duj' 



and |o denotes evaluation at the unperturbed fitted model: u> = cjq and 0^ = 0. 
Proof of Theorem 4.1 Taking partial derivatives of m we find 
dm dm duj 



da duj' da 

'dm{u:,'r{0)) 



dijj' 
dm{u},'y{0)) 



duj' 



^ dm{i^,j{0^))d-f{0^) ^^ 

dm{LJ,j{0^))d'y{0^) d0^ , 
e=L d-r{0J d0^ 9u}' 



Evaluating at the original model completes the proof. D 

Corollary 4.1 

The gradient is given by 

m = Sj^ + J'K's~f. 



(4.24) 



4.4.2 Second-order Assessment 

In order to make a general case expression for the acceleration, a large 
number of partial derivatives are needed. The following lemma provides an identity 
used to derive the algebraic expression for M. 
Lemma 4.1 



Define the following: 



Ti 



d(jjd(jj' 



Gi 



dHiO) 



d0d0' 



„ _ d'L 



dwdijj' 



Then, 



T=[lr^ J'] GJ+[K® Ig] P, 



100 



for 



rqxq 



r2 



G 



rpxp 



G2 





r ^ 


:* 


Pi 


Ppqxq = 


P2 




.^p. 



and where J and K were defined in Theorem 4.1. 
Proof of Lemma 4.1 Consider: 



dijjdu}' dui 
_d_ 



89, 



do', duj' 






du: 



7 + E 



d-YiiO^) 



j=i dOj^ 






du: 









+ 



de... 



Evaluating this equality at the null perturbation yields 



Ti — J GiJ 



< 



Stacking the Fj to form F completes the proof: 



J'GiJ 



j'GrJ 



+ 




[Ir ® J'] GJ + 



di2{L}T dy2{L) T 
:rr J-n ttx ■'■n 



ae 



962^ 



= [Ir ® J'] GJ + [K® Jg] p. n 



dwduj' 



dwdu' 



J 



(4.25) 



101 



Theorem 4.2 

The acceleration ofinSuence measure m{u}, 7(^w)) in direction v at the null pertur- 
bation is given by 



A 



dM^,7{0^)) 



= V 



da? 



,d^m{u,MO.)) 



duiduj' 



V 



-h :- = 



v'Mv 



(4.26) 



= V 



Ig J'K' 



-*9 

KJ 



* 



+ [s; ® J'] GJ + [s'^K <^Ig]P\ V, 



where 






dwdu' l0 dud-yiey I" 
d"i{e)duj' I0 d'y(6)d'y(0)< I0 



(4.27) 



Aiternativeiy, we may write 



^ = 1;' 



= t; 



J, j'a:' 



I, J'K' 



-*9 

KJ 



+ Y^8^Ti V 



i=l 



\ 



(4.28) 



+ j'[5]s^,G,]j + ^(s;ii:,)p, 



j=i 



i=l 



V, 



/ 



where s^. is the i*^ element of s^ and Ki is the i*^ column of K. 

Proof of Theorem 4.2 First we define the following {q + r) x 1 vector: 

/ 



S = 



\ 



lie.) 



102 



Then, 



9^m(a;,7(gJ) 
■: da' 



d_ 

da 

d_ 

da 

d_ 

da 



dm{8) d(jj 
duj' da 

dm{6) 
duj' 

dm{5) 
du' 



V 



v + 



dm{6) dv 
du}' da 



=^ V 



= V 



= V 



= V 



= V 



da dojdu;' 
,d^m{S) 



v + 



dudu} 
d_ 
d(jj 

_a_ 

du3 



V. 



dm[5) dS 
dS' du:' 



V. 



dmjS) 
dS' 



dS 



g+r 



fi,.,' "^ 2^ 



du3 



dS' d^mjS) dS 
duj dSdd' duy 

'dS'd'^miS) dS 



i=l 
g+r 

i=l 
r 



dm{S) d^Sj 
d6i dijjdijj' 

dm{6) d^Sj 
dSi duidu' 



V. 



V. 



, V^ dmjS) d'jije^) 



(4.29) 



aw dSdS' du;' -^ 57i(^^) dupdu' 

where the last simplification occurs because d'^cui/divdu}' = 0. Examining the first 
term of d'^Tn{S)/du}du}', and evaluating at the null perturbation we find 



dS' d^m{6) d6 



du dddS' dui' 



dui dui 



I, J'K' 



d^mjS) 
dSdS' 



Jo 



dw 

dui' 

du' 



J 



dujdw' l0 dwd-y(eY l^ 

di(9)du)' lo d'y(e)d'r{eY 1° 






(4.30) 



103 



Examining the second term of d^m{6)/du:du}' and evaluating at the null 
perturbation, we find 



dmjS) g^7i(^J 



E 



= [s; O /,] ([/, ® J'] GJ + [K^ Ig] P) 
= [s; J'] GJ + [s'^K ® /,] P 



(4.31) 



by Lemma 4.1. D 



4.4.3 Comments 



' While the algebraic expression for 5 is quite simple, the expression for A 
is somewhat cumbersome. However, a second-order assessment would typically be 
used only when the gradient vector vanishes. In most situations then, s-y = 0, and 



the acceleration matrix simplifies to 



M = 



I, J'K' 



■*9 

KJ 



(4.32) 



The algebraic expressions are helpful in that they decompose the diagnostics 
into building blocks, most of which are immediately computable. Specifically, the 
quantities 8^, s^, S, G, and K should be immediate upon choosing an infiuence 
measure and fitting the model. In the next section, it is shown that the matrices J 
and P are also easily obtained in most situations. 

4.5 Computational Formulas for J and P 
In order for the formula for the gradient and acceleration matrix to be of 
practical use, they must be easily computed in a wide range of applications. In this 
section, it is shown that the matrices J and P can be computed even when the 
unperturbed parameter estimates are not closed-form. The only requirement is that 
the perturbed estimating functions be closed- form expressions of a; . 



104 

4.5.0.1 Computation of J 

The following theorem generalizes equality (14) in Cook (1986). 
Theorem 4.3 
Let Oij be the solutions to the p perturbed estimating equations hi^{0; y) = 0. Then, 

J=-h~^A, - (4.33) 

where 

dK{e;y) 



A = 



duo' 
Proof of Theorem 4.3 By assumption, we have the following equality 

Taking derivatives with respect to cj' on both sides we find 



(4.34) 



dK{e-y) 






die' 
where is now a. p x q matrix. Evaluating both sides at the original model we find 

A + hJ = 0, 

and the theorem follows directly. D 

Theorem 4.3 has two implications on computation: the gradient vector 
is easy to calculate, and the acceleration matrix is easy to calculate provided 
s^ = 0. Neither h nor A require that the solutions to the estimating equations 
be closed-form for either the perturbed or the unperturbed inferences. In addition, 
the matrix h is often computed as part of the estimation process. Not only is 
it a by-product of using the Newton-Raphson root-finding technique to solve the 
estimating equations, but it serves as an estimate of the asymptotic variance of 0, 
the inverse of (4.4). When estimation is performed by ML, -h~^ is the inverse of 



105 

the observed information matrix. The matrix A is also easily computed provided 
that the expressions for the perturbed estimating functions are known. This 
computation was demonstrated many times in Chapters 1-3 for maximum likelihood. 
Computation of A could be difficult when the estimating equations themselves are 
not closed-form. An example would be ML inference for generalized linear mixed 
models (McCulloch 1997). 



4.5.1 Computation of P 

In this section, a formula for P is given so that it, like J, it can be computed 
without re-estimating the model. To begin, we present the following lemma, which 
provides an identity used to construct P. 
Lemma 4.2 
Let 0^ be the solutions to the p perturbed estimating equations hcj{9; y) = 0. 



where 



-h'i^Ig P = 


I, J' 


*^ 


I, 
J 



l,...p, 



hi = the i*^ column of h 



*i 



Qi Ri 



(4.36) 



Qi 



d'hUe;y) 



9u;9a;' 



Ri = 



d^hiU0;y) 



dujdO' 



hi = 



d'hiU0;y) ■''^' 



dOdO' 



.-■•' ■'• lif 



J = -h A. 



106 



Proof of Lemma 4.2 Taking the z*'* row of (4.35) in the proof of Theorem 4.3 
gives the following equality: 



dhi^{e\y) 



d(jj' 



dhi^{e^;y)d0^ 



0'. 



e^e^ del du^ 

where 0' is a 1 x g matrix. Taking derivatives of the first term on the left side of 
this equality with respect to a;, we find 



(4.37) 



du: 



dhi^{e;y) 



du}' 



0=0^ 



d^hU0;y) 



dijjduj' 



d0^d'hU0;y) 



0=0 ^^ d0dij}' 



0=0^ 



(4.38) 



Taking partial derivatives of the second term with respect to u;, we find 



_9_ 
d(jj 



dhi^{0^;y)d0u 
80',, 9u:' 



d'hi^{0-y) 



du:d0' 



+ 



d0^d'hiUO;y) 



du} 0080' 



9=0^ 



8uj' 



8hi^{0^;y) 89 j 



(4.39) 






8u}8u}' 



Using (4.38) and (4.39), the next step is to take derivatives of the equality in (4.37) 
with respect to oj and then evaluate the result at the null perturbation. It follows 
that 



d_ 
8(jj 



8hi^{0;y) 



8U3' 



dhi^{0^;y)80a 



0=0^ 



80, 



8u}' 



d'hUO;y) 



dujduj' 



+ 



d0^d^hUe\y) 



Q^Q^ 8u} 808(jj' 



+ 



8^hi,{0;y) 



8lj80' 



0=0 Jo 



80^ , 80,8%^{0-y) 



Q^Q 8u}' 8uj 8080' 



0=0. 



00^ 

8U3' 



E 

j=\ 



dhi^{0^;y) 8di, 



89, 



to, dujduj',^ 



(4.40) 



= Qi + J'R'i + RiJ + J'hiJ + Y^ 



dhiu{0;y) 



I J' 



Qi Ri 
Hi hi 



+ 



where is now a, p x p matrix. The lemma follows immediately. D 



I 
J 



89, 



h: ® I, 



107 



The theorem below now provides a formula for P that does not require 
re-estimation of the model. 
Theorem 4.4 *^ ^ s : 
Let 0^ he the solutions to the p perturbed estimating equations hu{0; y) = 0. Then, 



P = 



-1 



h ®[la J'] 



* 



/ 
J 



(4.41) 



where 



* = 



*i 



*. 



(4.42) 



Proof of Theorem 4.4 Taking the p sets of equalities of the form given in 
Lemma 4.2, we find ., 



-hi (8) Jq 



hp ® Ig 



Manipulation of the left side of the equality proceeds as: 



Ig J' 


*1 


I, 

J 




• 


/, J' 




J 





(4.43) 



-^'l ® Iq 






h\\Iq hi2lq 


• 


p = 


h^llq h22lq 


-h'p (2) Ig 








= 


-h®Ig 


P. 



108 



Manipulation of the right side of the equality proceeds as: 



I, J' \ 


*1 


I, 

J 






r 


lo 


J' 


*1 


r 1 


\ 









L 


J 




h 




r -] 








r 


-| 




L J 


J 


I, J' 


*p 


J 






- 


h 


J' 







[I, J'] 

[la J'] 



Ip^ila J'] 



* 



It then follows from (4.43) that 



h^Ir, 



h ®In 



"W 



Ip^lla J'] 



* 



* 



1 


^7 


Ip^ I, J' 


* 




J 


J 


J'] 


* 


I, 


. n 


-1 


J 





Corollary 4.2 



Pi = Y^i-h'^) 



p 



Ig J' 



*. 



= Yi-h''){Qj + RjJ + J'R'j + J'hjJ) 



109 



4.5. 1.1 Remarks 

The formula for P can be combined with the results of Theorem 4.2 to 
further revise the acceleration matrix: 



M = 



dujduj' 



I J'K' 



I J'K' 



I 
KJ 

I 
KJ 



+ [s; ® J'] GJ + [s'^K ® /,] P 



+ [s; ® J'] GJ 



s'^K ® Jj 



h~ ® [ J^ J'] 



* 



I 
J 



I J'K' 



I 
KJ 



+ [s; ® J'] GJ 



+ 



-s'^Kh ®[T J' 



* 



I 
J 



Also, we may write 

p 



P 






1=1 



p p 



/, J' 


*. 


^. 




J 



= EEK^^)(-^'') 



j=l i=l 



/, J' 



*. 



-*9 

J 



= Y^{s'^K{-h-;)) 



j=i 



/, J' 



*, 



(4.44) 



(4.45) 



Hence, another form of the acceleration is given by 



M 



I J'K' 



I 
KJ 



^J'[Y.s„G,]J 



i=\ 



i^^Kh, ') 


I. J' 


*i 


I, 


i=i L J 


J 



Note that in all computations, J = —h A as per Theorem 4.3. 



110 



(4.46) 



4.6 Local Assessment for Maximum Likelihood Inference 
In this section some brief examples when estimation is performed by ML are 
given. The influence measures addressed include linear combinations of parameter 
estimates, ratios of parameter estimates, the likelihood displacement, the profile 
likelihood displacement, and the Kullback-Leibler divergence. Emphasis is placed on 
constructing the building blocks of local influence. From Theorem 4.3, J = —L A. 
The individual measures determine 7(^0;) and the remaining building blocks: 



dmiiuniO)) 



K 






dm{u},'y{e^)) 



36' 



S = 



dj{e^) 






4.6.1 Linear Combinations 

Consider a linear combination of the parameter estimates under perturbation: 



^ti>.: 



m = xOui. 



The building blocks of first-order local assessment are: 



7(00,)= x'K 



s^= 



K= x' 

8r^ = 1. 



Ill 

Construction of the gradient and directional derivative is immediate: 

S^rh'v 

— -x'L Av. 

Note that the direction of maximum absolute slope is proportional to m oc A'L x. 
This is exactly the same as the direction of largest curvature for the profile likelihood 
displacement of x'O^j under the orthogonal parameterization given in Theorem 2.1. 

y* ,. J, ■>. ■ \ 

4.6.2 Ratio of Estimates i« • VV ^' "^^ 

For the measure m = Oi/Oi^^, the building blocks for a first-order local 
assessment are 

where dj is a p x 1 vector with a 1 in the i"* position and zeroes elsewhere. Hence, 
S = rh'v = (si + s'KJ)v = Id'iL'^Av = IV'Av, 

" i +1, " — 1 

where L is the r"' column of L . 

4.6.3 Likelihood Displacement 

Recall that the likelihood displacement is given by 

m(a;,7(^J) = 2[L(^;i/)-L(0<,;2/)]. 

This influence measure requires a second-order local assessment, as is now shown. 
First, it does not depend on w explictly, so 8^^ vanishes. Since the exact form of the 



112 



likelihood has not been specified, we simply write 7(^cj) = 9^. It follows that 



am(a;,7(^J) 



dl{9.) 



dL{e^-y) 



= -2 
56>, 



= -2 



dL{e-y) 



80 



0. 



Hence, the gradient vector is zero. Proceeding with a second order assessment, we 



first note that K = Ip. It follows that 



M = 



I, J'K' 



I J'K' 



I -A'L 



I -A'L 



-1 



■^9 

KJ 

I 
KJ 





-21/ 



+ [s; ® J'] G J + [s'^K ® Jj P 



L A 



-L A 



= A'L \-2L)L ^A 
^-2A'L~^A. 



This is equivalent to the result given in Section 2.1, as per Cook (1986). 

4.6.4 Profile Likelihood Displacement 

The profile likelihood displacement, 

m(a;,7(0J) = 2[L(^i,^2;y) - /^(^w,^(^iu.);2/)], 

also requires a second-order assessment. First, the influence measure does not 
depend on a; explictly, so s^j vanishes. Also, 



O-y 



am(a;,7(0.)) 



di{e^) 



-2 



aL(7(^J);y) 



di{e^) 



-2 



dL{e-y) 



30 



= 0. 



This implies that the gradient vector is zero. 



113 



The next step is to construct the building blocks of a second-order assessment. 
In order to obtain a formula for K, we present the following result, which appeared 
in Cook (1986) as equation (24): 

. dg{9,) 



de\ 



— Llryn Ll 



22 ■'^21 



where 



X = 



L21 L22 



Using this result, we find 



K 



dim 



dO,. 



d9i 






I 

— 1/22 -^21 






(4.47) 



(4.48) 



114 



It follows that the acceleration matrix for the profile likelihood displacement 



IS 



M^'^' = 



I J'K' 



I J'K' 



I J'K' 



I J'K' 



= -2J'K'LKJ 



KJ 



+ [s; ® J'] GJ + [s'^K Ig] P 





" d')(e)di{9)' 10 



'-' flOflOl 



/ 

KJ 



d9d9' 10 



/ 

KJ 





-2L 



I 
KJ 



-2A'L 



= -2A'L 







■t'll ■£'12 

L21 L22 



/., 

22 ^21 



-1/99 Loi 



Z-^A 



-t'li — L12L22 L21 




L A 



-2A'{L ' - 





L22' 



)A. 



This is equivalent to the formula given in Section 2.1, as per Cook (1986). 

4.6.5 Kullback-Leibler Divergence 

The Kullback-Leibler divergence, 



m — Ea 



log(i(?^) 
/(W|9J 



115 



also requires a second-order assessment. The measure does not depend on u; 
explictly, so s^ vanishes. Also, 7(^0;) = 0^, implying 

dm{u;,-f{e^)) 



di{e^) 



= — ttEo 

del 

dE, 



f{y\e.)' 



oe' 



--Ee 



dL{e^-y) 



de, 



-Ee [0] 



0. 



Proceeding with a second-order assessment, we first note that K = Ip. We 



also make the following observation: 



•va(-. 



— ; TrEa 



f{y\e^y 



— : ttEs 



L{e^\y) 



■ . . -. ■ =-^ 

Evaluating this at the null perturbation implies 

' d^L[0-y) 
ddde' 






(4.49) 



'-'77 ~ '^e 



Jo 



and the acceleration matrix follows: 



I J'K' 



I -A'L 



= -AT' E 



/ 
KJ 



s 



77 



-L A 



(4.50) 



d'L{0;y) 
8080' 



l-'a. 



116 



An interesting consequence of this result is the following: if the observed and 
expected information matrices are equal, then the acceleration matrix is proportional 
to the acceleration matrix of the likelihood displacement. 

4.7 First-order Assessment of the Log Generalized Variance 
In this section, some general results for first-order local assessment of the log 
of the generalized variance are presented. In addition, the directional derivative for 
the variance perturbation of a regression model under ML is derived. 

4.7.1 General Form of the Directional Derivative 
Theorem 4.5 
For the measure 



m = log 






the A;*'' element of the vector s^ is given by 

'd'K{0;y) 



—trace 



h 



duJkdO' 
and the f^ element of the vector 8^ is given by 



k = l,...q, 



— trace 



Additionally, we may write 



dejdo' 



h 



j = l,...p. 



s^ = R{-h ) = 



ill -^2 • • • Rp 



vec 



(-^ ), 



where each Rj is a q x p matrix of the form 



Rj — 






3 = l,-P- 



(4.51) 



(4.52) 



(4.53) 



(4.54) 



117 



Similarly, 



8-y = h{—h ) 



hi /i2 hp 



vec{—h ), 



(4.55) 



where each hi is a p x p matrix of the form 



d9d0' 

Note that the matrices Ri and hi appeared earlier in the computational 
formula for P (Theorem 4.4). 
Proof of Theorem 4.5 

To begin, we present the following two matrix differentiation results for a matrix A 
and a scalar t (Searle et al. 1992): 



dAT^ _ _^-.dA 

dt ~ ^ dt^ 



d_ 
dt 



log |A| = trace 



.idA 

'dt 



(4.56) 
(4.57) 



Combining these two identities yields 
d 



dt 



log \ — A ^ I = trace 



— trace 
= —trace 



(-A-')-'l(-A-') 



dA 
df 



(4.58) 



Using (4.58) we find 
. d 



dui 



m{u}, 0u) = —trace 



d^K{e-y) 



dcjkde' 



dK{0^;y) ^^ 



6=9^ 



dO' 



y 



and evaluating at a; = wq and 6 = yields (4.52). Expression (4.53) follows from a 
similar argument. 



118 



The next step is construction of (4.54). First, the trace of product of two matrices 
Arxc and Bcxr is (Searle 1982) 



Letting A*-' denote the ij*'' element oi h , we may write (4.52) as 



p p 



dhU0;y) 






p p 






h'K 



Construction of s^ follows: 



«a, = - 


P P 

1=1 j=i 


3hi,^{e;y) 

aK,i(6\y) 










dhi,^{e;y) 


dhx,^{e;y) 
dOpUJi 


a/i2,u,(e;v) 

deiuji 


dhiAOw) 

aepui 


dhpMOw) 

d6p(jj-i 


= 


dhi,^(e;y) 
dBiuj2 


dhi,^{e;y) 
d9pUi2 


'" <-^' 




■ 




dOlUlq 


dhi,^{e;y) 

dOpUlg 


dh2,u(e;y) 

deiUq 


dh2,u,(6;y) 

dBpUq 


dhp,^(0;y) 

dOpUlq 



vec{~-h ) 



ill R2 • • • Rp 



vec{—h ). 



Construction of s^ proceeds similarly. D 

Corollary 4.3 

For the measure (i.bl), the slope in direction v is 



-1, 



s = vec{-h y 



R' + hJ' 



V, 



and the direction of maximum absolute slope is 



(4.59) 



Umax <X m = 



R + Jh 



vec{h ) . 



(4.60) 



119 



4.7.2 Variance Perturbation for ML Regression 

In this section, first-order local assessment diagnostics are derived for the 
variance perturbation in ML regression with the perturbed generalized variance 
as the objective measure. Because h has a simple form, we derive s^^ and s^ 
element-by-element rather than construct the matrices in (4.54) and (4.55). 
4.7.2.1 Construction of 8t^ 

To derive the i*^ element of s^^ we begin with the following results from 
Section 2.5: 

d^K{e-y) _ 1 



dl3dui 
d'LUO;y) 1 



;{yi - x'i0)xi 



'/3\2 



{yi-x[0) 



(4.61) 
(4.61a) 



where Xj is the covariate vector for the i*'' observation. Taking partial derivatives of 
(4.61) leads to 



d'LUe;y) 
d^d/3'duji 






xnp-i)Xi 



r.XiX^ 






Taking partial derivatives of (4.61a) yields 

d'L^{e;y) 1 



' a\2 



{Vi - <I3) 



After evaluating these expressions at the null perturbation, we can construct the 



matrix 



dK{9-y) 



duide' 



d'L^{0;y) 



dedd'dui 



- ~2XiXi 
r; / 



v4 "^i 



-s 



(4.62) 



120 



Accordingly, the trace of the following matrix multiplication is the i^^ element of s^^: 



dK{0;y) 



dujide' 



{-h ) = 






--^y 
»*^i 



a^x'xy^ 

0' 2(TVn 



XiX^{X X) "^"^i 



^x[{X'X) 



lV\-l 2r^ 



(4.63) 



In order to find a formula for the trace, an expression for the fc"' diagonal element 
of the upper left portion of the matrix is needed. Letting Ujk be the jk*^ element of 
{X'X)-\ the A;*'' diagonal element of Xix'iiX'Xy^ is 

p-i 

'•}■■ 
• . j=l ■ * . 



Hence, the trace of (4.63) is given by 

p-i P-i 2r^ ""^ ^~^ 2r? 

fc=i j=i j=i fc=i 

From this, the vector s^j can finally be constructed: 



■vp — 1 



2rH 



(Xy7=l ^nj Zjfc=l(%fe^nfe)j + „o.2 



X21X2 




an 

1-, 


— 






"21 

_ a2(p-i) _ 



%;K \ 



Xi(p-i)Xi 



Xn{p-l)X^ 



a(p-i)i 



a(p-i)(p-i) 



na 



2 «9 



(4.64) 



Letting a, be the i^^ column of (X'X) ^, it follows that 



121 



r -| 




r -1 






XuX[ 




^i2a:i 






3:21X2 


ai - 


3^21 aj'a 


a2.. 


. — 


_ a;„ix; _ 




_ a;„2< _ 







X2{p-l)X2 



^n(p-l)25„ 



flp-1 



ncr 



2' «9 



XiiX'i 



3:21X2 



a:i(p-i)Xi 

3:2{p-l)X2 



3:nlX„ . . . Xn(p-i)Xj^ 



tti 



ttp-i 



no 



2' «9 



£)(Xi)X ... D(Xp_i)X 



vec(X'X)-^ - — -r 



ncr 



2' «9 



D(Xi) ... D(Xp_i) 



[Vi®X]vec(X'X)-^ 



no" 



2' «9- 



4.7.2.2 Construction of Sy 

Starting from the Hessian of the unperturbed log-likelihood, 



dh{e;y) _dL{e;y) 



d0' 



dOdO' 



-ix'x 



-^(y-X/3) 



■My - x/sy ^-Mv- ^^Yiy - x/3) 



(4.65) 



the following partial derivatives at the null perturbation can be computed: 

d'Lie^y) 



d'L{0;y) 



= 



da^d^dlS' 
d'L{0;y) 



Xx'x 






d'HB; y) 



= -.X'iy - X/3) 



= ttXV = 



d{a^y 



-^e + ~s(y-xf^ny-x^) 



n Sr'r 2n 

a^ (T* a^ ' 



122 



Now the elements of Sj can be constructed: 



dm{u), e^) 



d/3i 



= trace ( 



d'L{0-y) 



= trace ( 

= trace ( 
= 0. 



d/3id0d0' 




i-h- )) 



1 v' 



X'Xi 



1 vi 



'-.x\x 







j,X',X{X'X)-' 



<t2(X'X)-i 
0' 



2£l 

n 



"^X'Xi 

•n ' 







) 



Also, 



dm{(jj, 0^) 



da^ 



— trace ( 
= trace ( 

= trace ( 
p + 3 



d'L{e;y) 



da^dode' 

i(X'X) 



i-h")) 



0' 



^I 



0' 4 



2n 



a2(X'X)-i 



0' ^ 

n 



(4.66) 



(4.67) 



Hence, all but the last element of s^ is zero: 



tJ/y 







123 



4.7.2.3 Directional derivative 



Using the results above and A from Section 2.5, we may finally construct the 
directional derivative of the log of the perturbed generalized variance 

S={8l + s'KJ)v 



= {8l + s'K{-L-)A)v 



-K + 



= « + 



0' 2+3 



Q, {p+Z)a^ 



a2(X'X)-i 
0' 



251 

n 



A)v 



j,X'D{r) 



2&* sq 



)V 






(-[D(Xi) ... D(Xp_i)][Vi®^]vec(A-'X)-i 



\T..^ 



^n' sq ' ~ 9 



p + 3 , 
rsq)v 



= {-[D{X,) ... D(X,_0][Vi®^]vec(X'X)-^ + ^r,,)V 



Hence, the direction of the maximum absolute slope is 



(4.68) 



Umax oc -[ D(Xi) . . . r>(Xp_i) ] [Ip-i ® X] vec(X'X)-^ + ^^r,,. (4.69) 



4.7.2.4 Simple random sample 

As a special case, consider a normally distributed simple random sample 
(SRS). The direction of maximum slope simplifies to 



1 3 

Vmax OC m = -£>(!„) [1 ® 1„] vec(-) + ;jT^r«, 



n na' 



oc 



fsq „ In 



(4.70) 



Hence, observations with squared residuals that deviate largely from ^ have large 
elements in Umax- This may seem odd, but recall that this represents influence on 
an estimated joint asymptotic confidence region for /i^^ and a^. If the log of the 



124 




Figure 4.1: Hills data; Vmax for perturbed log generalized variance of 

variance of these two estimates is considered separately, the directions of maximum 
absolute slope are respectively given by 



»^niax CX T 



sq 



a^r 



^max 0^ ' 



sq 



The direction of maximum slope for (1^ is dominated by squared residuals that 
deviate greatly from the average squared residual, a^, while Umax for a^ is dominated 
by large squared residuals. The direction of maximum slope for the joint influence 
measure is simply a weighted average of these two vectors. 
4.7.2.5 Examples 

As examples, we consider the Scottish hills data and the school data. For 
the hills data, the element of Umax for observation 1 1 is large and the element for 
observation 18 is huge (Figure 4.1). For the school data, elements 3 and 18 show 
substantial influence on the perturbed generalized variance (Figure 4.2). 



125 




Figure 4.2: School data; fmax for perturbed log generalized variance 9 



4.7.2.6 A simpler measure 

The directional derivative simplifies greatly if the measure is based on the 
unperturbed generalized variance evaluated at the perturbed estimates: 



m = log 



' ae' ' 



In this situation, s^ vanishes, but the other derivations follow from above, yielding 
the following results for the variance perturbation: 



c P + 3 , 



na 



2 »9 



'-'max 



p + 3 



''^max CX T 



sq- 



4.8 Local Assessment for Quasi-likelihood Inference 
In this section, local influence assessment of quasi-likelihood estimates 
(Wedderburn 1974) for independent observations is outlined. A thorough treatment 
of quasi-likelihood inference can be found in McCullagh and Nelder (1989). Here, we 
present only some fundamental concepts, viewed from the vantage point of optimal 
estimating equations (Godambe and Heyde 1987; Desmond 1997a). Some local 
influence techniques are then outlined. 



126 

4.8.1 Quasi-likelihood ... 
Consider a model in which a set of n independent observations y depends 

on a set of p parameters through their means ti{0). Although full distributional 
assumptions are not made, we also assume that the variances of y depend on their 
means: V[j/] = a'^V{n). A class of estimating functions that we may consider for 
estimating are given by 

h*i0;y) = W{0){y-fi{0))y, (4.71) 

where the p x n weight matrix W{0) can be a function of the unknown parameters. 
Standardizing (4.71) as in (4.3) yields (Wedderburn 1974; Godambe and Heyde 
1987): 

h{e; y) = %V{t^)-\y - t^m/<^'- (4.72) 

These optimal estimating functions are called the quasi-score functions because 
the solutions to h{0;y) = maximize the following "quasi" -likelihood function, 
provided the integral exists: • 

4.8.2 Local Assessment 

Some preliminary results for local assessment of quasi-likelihood models for 
independent data are presented in this section. Assumptions about the model's 
form are presented, some perturbation schemes are proposed, and a case-weight 
perturbation is applied to an example data set. 
4.8.2.1 Assumptions 

It is assumed that observations are independent, and that the variance is 
related to the mean by the function V(/i). In addition, it is assumed that the mean 



127 



is related to a linear predictor 77 = x'fi through a link function, g: 



gilJ') = V- 



4.8.2.2 Perturbation Schemes 

One perturbation is to introduce case- weights, or equivalently, variance 
perturbations. A set of n perturbations can be introduced into the estimating 
functions in a multiplicative manner: 

hU0;y) = ^D{u:)V{fj,)-'{y-t^{e))/a\ 

Writing the j'*'' perturbed estimating function in summation notation shows how the 
perturbations weight the contributions of the individual observations: 






The matrix A is then given by 

dhUe;y) 



A = 



d(jj' 



1 ^^^'.DiV^) 



a^dO 'V{^i) 

where D{^^^) is a diagonal matrix with i*^ diagonal element {xji - Hi)/V{fii). 

If the response is continuous, then a perturbation can be made to assess 
additive errors in measurement. The perturbed estimating functions are then 



KiO;y) 



'de 



V{n)-^y + u;-n{0))y, 



which leads to 



1 dK{e-y) _ dfi' 
~ dO 



du}' 



v{n-)- 



Other perturbations can also be introduced, including errors in the explanatory 
variables and perturbations to the linear predictor. 



. , 128 

4.8.2.3 Example: Case-weights Applied to Leaf Blotch Data 

As an example, we apply the case-weight perturbation to a data set 
concerning leaf blotch (Wedderburn 1974; McCullagh and Nelder 1989). The 
percentage of leaf area affected by the blotch was recorded for 10 varieties of barley 
grown at 9 sites. Although the response is continuous, it can be analyzed as if arising 
from binomial sampling using the logit link function and a main effects model: 

V{fii) = Hi{l - fjLi) gifii) = log — -^— = T]i. 

Plotting the Pearson residuals from this model against the linear predictor indicates 
that the model overestimates the variance for the largest and smallest proportions 
(Figure 4.3). Applying the case weight perturbation requires h and the following 
matrix: - ,. . , ' ,• 

A = ^X'D{y-ii), 

where X is the model matrix and D{y — fi) is a diagonal matrix of the raw residuals. 
To measure the effects of perturbation, we use the unperturbed Pearson Chi-squared 
statistic evaluated at the perturbed MLEs: 



m 



= E 



This implies that the final building block for a first-order assesment of this measure 
is given by 

dm 



^'' 80 



= X'z,, 



where the n x 1 vector Zi has z*'' element 



2y,/if + yf(l-2/i,)-/^^ 

Ai(i-Ai) 



129 
The direction of maximum local influence is then 

Vraay.CC D{y- ij,)Xh X' Zi- 

Plotting the elements of Vmax against case indices indicates that elements 24 (site 
3 variety 4) and 65 (site 7 variety 5) are influential (Figure 4.4). When plotting 
elements against the corresponding linear predictor, it is clear that observations 
with moderately-sized proportions are more influential (Figure 4.5). 

As in Wedderburn (1974) and McCullagh and Nelder (1989), we consider 
fitting a model with an alternate variance function, V{ijl) = /i^(l — /x)^. This function 
reflects the assumption that extreme proportions are more precise than those arising 
from a binomial distribution. Pearson residuals indicate that the alternate variance 
function is more realistic, but that a relationship between the residual variance 
and the the linear predictor remains (Figure 4.6). Implementing the case- weight 
perturbation and assessing its effects on the Pearson Chi-square statistic requires 

the following quantities: ■: ^ -, 

^' '-'■ '■' - "'' .■- ■; 

0-2 /i(l - /x) 



and 









dm 



X'Z2, 







where the n x 1 vector Z2 has i*'' element 

2{fif - Sfj-fVi + ili{2yf + yj) - yf) 

The direction of maximum local influence is then 



t^max OC D{^^—^)Xh ^X'Z2. 
/i(l-/x) 



130 



■H 1 

n 

01 

u 



Vt^ 



TT' 






• •« 






% • 



-8-6-4-2 2- 

h 

Figure 4.3: Leaf blotch data; Model I; Pearson residuals vs. linear predictor 

A plot of Umax indicates that only observation 24 is influential for this model (4.7). 
The elements have less association with the linear predictor than the binomial model 
(Figure 4.8), indicating that the contribution of the observations to inference is more 
appropriate in the second model. 



rv< 



131 



1 

0.75 

0.5 

0.25 



-0.25 

-0.5 

-0.75 



**<* « * V. * • .i f*^ V*^^^^^ ' "V * «. **' ; **' » J 



20 



40 60 

Case 



80 



Figure 4.4: Leaf blotch data; Model I; Vmax- The first ten indices correspond to site 
1, varieties 1-10, while the second ten indices correpsond to site 2, varieties 1-10 etc. 
The dotted vertical lines are present to help differentiate sites and varieties within 
sites. 



0.6 

0.4 



S 0.2 
I 







-0.2 



—I— I— I— I— I— 



«> i. 'J . '^. .v. a* 



> *" *«■ »': ' '., v.. A'» «v^ 



»• 



-6 -4-2 2 

h 



Figure 4.5: Leaf blotch data; Model I; Umax vs. linear predictor 



132 



.A^'^^' 



2 






■ 

• 




2 






' .*■ • 




.■S 1 

01 

« 
u 




• 


• 


• • 
• 
* • • 
• • • • 


• 
■ 






• .rf'V •.; 


1 • 

1 


-1 


• I 


• 
• 


H ;> • * . 

■ 


1 
• 



-4 -2 
h 



Figure 4.6: Leaf blotch data; Model II; Pearson residuals vs. linear predictor 



1 

0.75 

0.5 

0.25 



-0.25 

-0.5 

-0.75 



. /-^ 


•". 


• 


• 


• 
• 


• 
• 


• 


• 
• 


• 


^' 


.%••• 




. . . 

• • 


• 


< H.O 


• • 

• • 


• 





20 



40 
Case 



60 



80 



Figure 4.7: Leaf blotch data; Model II; Umax- The first ten indices correspond to site 
1, varieties 1-10, while the second ten indices correspond to site 2, varieties 1-10 etc. 
The dotted vertical lines are present to help differentiate sites and varieties within 
sites. 






0.6 



0.4 



S 0.2 







-0.2 



133 



•. • • 



•,'• 



-8 -6 -4 -2 
h 



Figure 4.8: Leaf blotch data; Model II; Umax vs. linear predictor 



CHAPTER 5 
LOCAL ASSESSMENT OF PREDICTIONS 

In this chapter, local assessment in prediction problems is considered. First, 

prediction is motivated by two methods: first principles and predictive likelihood. 

This is followed by a brief discussion of influence measures for estimation and 

prediction. Then, predictive inference and existing local influence techniques for 

linear mixed models are reviewed. These techniques are then extended to assess the 

effects of perturbations on predictions, and three perturbations schemes are applied 

to an example data set. 

5.1 Prediction via First Principles 
Consider the problem of predicting a vector of random variables, u. 
Among predictors u*, u — E(m) minimizes the mean squared-error of prediction, 
MSEP(m*) = E(m — w*)^. The standard error associated with this prediction 



is y/MSEP(w) = y/Y{u). However, prediction is typically not so simple; it is 
usually performed using information from a sample and in the presence of unknown 
parameters. 

5.1.1 Prediction After Data are Observed ? 

Suppose now that prediction of u takes place after observing data y. If 
the two random vectors are correlated, then an improved inference can be made 
about u. Here, the conditional mean, ti = E(w|y), can serve as a point prediction. 
This prediction minimizes the conditional mean square error of prediction, 
CMSEP(m*) = E(u - u*\yy. Since it minimizes the CMSEP for given y, it also . 
minimizes the (marginal) MSEP. This best predictor, ii, differs from u in that it is a 
random variable, rather than a constant. Also, the best predictor's MSEP can be no 

134 



135 
larger than that of it: 

MSEP(m) = V(w) 

= E(V(M|y))+V(E(«|?/)) 

>E{V{u\y)) 

(5.1) 

= E{E{{u-u)'\y)) 
= E(CMSEP(u)) 
= MSEP(w). 

Other predictors can be derived if the class of predictors is restricted. For 
instance, minimizing MSEP among predictors that are linear functions of y results 
in the best linear predictor (Searle et al. 1992): 

BLP{u)^E{u) + CV-\y-E{y)), (5.2) 

where V is the variance of y and C is the covariance of u and y. Although 
not required by its definition, the BLP is an unbiased predictor of u since 
E{BLP{u)) = E{u). 

5.1.2 Prediction After Data Are Observed, but with Unknown Parameters 

Typically the goal of prediction is accompanied by a need to estimate 
unknown parameters 0. Here, we may first estimate the parameters using the 
observed data and then estimate the best predictor. The MSEP of the estimated 
best predictor can then serve as a measure of uncertainty in the prediction, although 
Booth and Hobert (1998) argue that CMSEP is more relevant. 

To illustrate, consider the normal linear regression model with unknown 
parameters, /3, but with known variance, a^. After observing data y, the goal 
is to predict future independent observations u ~ N{Z^,a^Im). From either 
ordinary least squares or ML, = (X'X)-'^X'y. Hence, the estimate of 



136 

E{u\y) = E(m) is Z^, and its mean squared error of prediction can be shown to be 
a^{Z'{X'X)~^Z + Im)- Also, Z^ is normally distributed. These facts can be used 
to create a 100(1 — a)% prediction interval for a single it with covariate vector z: 

z'3 ± Za/2 ^J<r^z'iX'X)-'z + l). (5.3) 

In the event that a^ is unknown, the prediction interval given in (5.3) is 
still valid but cannot be calculated. One option is to simply replace a"^ by s^ to 
produce an approximate prediction interval. A better alternative is to make use of 
the following pivotal quantity: 

u - z'^ 



tn-fc, 



(5.4) 



y/s^{z'{X'X)-'z + l) 
where k is the number of regressors. An exact 100(1 — a)% prediction interval is 
then 

z'0 ± t„^k,a/2\JsHz'{X'X)-^z + 1). (5.5) 

This simple situation highlights the fact that unknown parameters can 
complicate the prediction problem. While pivotal quantities may be available in 
simple situations, accomodating unknown parameters may be more cumbersome 
in complicated models. Therefore, attempts have been made to unify predictive 
inference via predictive likelihood. 

5.2 Predictive Likelihood 
Predictive likelihood is a likelihood-based framework for handling the 
problem of prediction. The goal is to predict w via a point or interval predictor in 
the presence of unknown parameters 0. If u is treated as a vector of parameters, 
then the joint likelihood of u and is 

L*{u,9-y) = f{y,u\e) = h{u\e,y)h{y\e). (5.6) 



137 

A predictive likelihood for u would result by somehow eliminating from the above 
expression. For Bayesians, this can be achieved in a straightforward manner. The 
posterior density of given the data y is obtained, and then (5.6) is integrated over 
the density of 0. The resulting expression is a posterior predictive likelihood for u. 
For frequentists there are several competing options, as outlined in Butler (1986) 
and Bj0rnstad (1990). Here we consider two: the estimative predictive likelihood 
and the profile predictive likelihood. 

5.2.1 Estimative and Profile Predictive Likelihood 

The estimative/empirical predictive likelihood replaces in (5.6) by 0, its 
MLE. The resulting predictive likelihood is actually proportional to the density of u 
given y with the MLE inserted in place of 0. This is considered a naive approach, as 
resulting inferences for u do not take into account the uncertainty in the estimated 
parameters. An alternative frequentist predictive likelihood can be motivated by 
profile likelihood. The idea works as follows. If is considered a set of nuisance 
parameters, then a technique for estimation is to first maximize the joint likelihood 
of {u,0) with respect to for fixed u. The resulting solution, 0{y,u), is then 
substituted into the likelihood. The resulting expression is a function of only u and 
y, and is known as the profile predictive likelihood of u (Mathiasen 1979; Lejeune 
and Faulkenberry 1982; Levy and Perng 1984). 

5.2.2 Point and Interval Predictions 

Regardless of which predictive likelihood is chosen, point and interval 
predictions are of interest. One option for a point prediction is to maximize the 
predictive likelihood, £{u; y) with respect to u. However, the resulting maximum 
likelihood predictor can have poor properties (see Bj0rnstad, 1990, examples 4 and 
7) . An alternative can be motivated by standardizing the predictive likelihood to be 
a predictive density for u: f{u\y) = £{u; y) / f2{y) . The mean of this conditional 



density, the predictive mean, is the preferred point predictor. Standard errors and 
prediction intervals can also be constructed from this predictive density. 
5.2.2.1 Example 

Consider again the situation of predicting future observations in a linear 
regression model with 0' = {0',(t^) unknown. Naively replacing by in the 
conditional distribution of u given y yields the estimative predictive likelihood. 
Hence, inference proceeds under the assumption that u is conditionally distributed 
as normal with mean Z^ and variance a'^Im- A prediction interval for a single u 
with covariate vector z would then be 

In contrast, the conditional distribution of u from standardizing the profile 
predictive likelihood is multivariate t with mean Z0 and dispersion parameter 
na^{I + Z'{X'X)~'^Z), as shown by Levy and Perng (1984). A prediction interval 
for a single u would then be 



z'^±tr,-pa^Jl + z'{X'X)-'z. 

This matches the exact prediction interval given in (5.5). 

5.3 Influence Measures for Prediction 
In this section, some general measures are proposed for assessing the effects of 
perturbation on point estimation and point prediction. Notation is as follows: $ are 
point estimates for parameters and u are point predictions for random variables 
u. Also, let ^^ and u^ denote the perturbed versions of these point inferences, and 
let PL{u;y,0) denote the log of the unstandardized predictive likelihood. 



139 

5.3.1 Predictive Likelihood Displacement 

An overall measure of the change in estimates and predictions u due to 
perturbation is the predictive likelihood displacement: 

PLDiLj) = 2[PL{u; y, ^) - PL{u^- y, 0J]. (5.7) 

A local assessment of this influence measure could be first- or second-order, 
depending on the model and the method of inference. 

Thomas and Cook (1990) considered influence on predictions in generalized 
linear models (GLMs). To explain, let u denote a single future observation that 
is independent of the observed data. Also, let f{-\0) denote the density function 
shared by the observed data and the future observation. Thomas and Cook (1990) 
proposed the following influence measure for assessing the eflfects of perturbations 
on predicting u: > , t / 

d*{u:) = \og f{u\0) -log f{u\0J. (5.8) 

In the context of the article, parameters were estimated by ML and point predictions 
were u = E{u\y,i3). For ML estimation of a GLM, d*{u}) is non-negative and 
minimized at the null perturbation, indicating that a second-order local assessment 
is appropriate. However, in other situations it can be negative. 

The influence measure d*{u}) is motivated by a deviance residual for the i*'' 
observation (McCuUagh and Nelder 1989): 

^ogfivM = Vi) - log f{yi\ni = fii0)), (5.9) 

where E{yi) — /Xj and fli is the estimated mean of the observation based on the 
model. In the context of GLMs, each deviance residual is non-negative, with smaller 
values indicating a better fit. The deviance of a model is defined as the sum of the n 
deviance residuals and serves as measure of overall goodness-of-fit. 



140 

The influence measure d*{u}) can be related to predictive inference in the 
following way. First, define the following modified predictive likelihood displacement: 

PLD*{u;) = 2[PL{u;y,^) - PL{u;y,0j]. (5.10) 

This differs from PLD{u}) in that the prediction for u is not perturbed. Also, 
assuming y and u share the same density and that all elements of y and u are 
independent, we have 

m m 

PLD*{u;) = 2[log/(y|^) + ^log/(n,|^) - log/(yi;9J - ^log/(n,|;9J] 

t=l i=l 

m 

- 2[log/(y|;a) - log/(y|^J] + 2^(log/(«,|^) - log/(«,|^J) (5.11) 

i=l 

m ' •, . 

1=1 
where d*{u}) is the deviance residual of perturbation for the i*'' prediction. Note that 
the prediction for u is the same in both terms of rf*(w). Technically then, d*(uj) is a 
measure of the pertubation's effects on the estimated conditional mean of u rather 
than a change in the prediction for u. 

We note that under the same assumptions given above, the predictive 
likelihood displacement (5.7) can be written as 

m 

PLD{u}) = LD[(jj) + 2 ^ di[u}), 

i=l 

where 

di{l^) =\og f{Ui\'p) -log f{UiS^ J. 

5.4 Estimation and Prediction in Linear Mixed Models 
In this section, the linear mixed model is presented with point estimates and 
point predictions motivated in three ways. Then, local influence techniques for the 



141 

linear mixed model are reviewed, and additional tools for assessing predictions are 
presented. 

5.4.1 Linear Mixed Models 

The linear mixed model (LMM) is formulated as follows: 



y = X)9 + zu + €, 



(5.12) 



where 



u 







5 


G 


e 


\ 







R 



y is an n X 1 vector of observations, X is a known n x pi matrix, Z is a known 
n X p2 matrix, ;9 is a pi x 1 vector of unknown fixed effects, u is a p2 x 1 vector of 
unknown random effects and e is a n x 1 vector of unknown residual errors. The 
matrices G and R are of order p2 and n respectively. 

Littell et al. (1996) provide a host of applications for this rich class of models. 
When there are no random effects and R = a"^!, this model simplifies to the fixed 
effects regression model (1.10). The inclusion of random effects and the general 
residual variance matrix R permits the modeling of correlated data. This is evident 
by looking at the marginal distribution of the data under model (5.12): 



% . «> *r>- ' 



■I 'r** 

■J- *- 



Y ^N{X^,ZGZ' + R). 



(5.13) 



The unknowns in (5.12) are the fixed effects 0, the random effects u, and the 
elements of the covariance matrices G and R. In this chapter, we will concentrate 
on assessing the effects of perturbations on inferences for and u. Point estimates 
of and point predictions of u can be motivated by three methods: first principles, 
mixed model equations, and profile predictive likelihood. See Searle et al. (1992) for 
information on estimating G and R when these matrices are unknown. 



142 

5.4.2 Inference by First Principles 

The best linear unbiased estimates (BLUEs) of are the generalized least 
squares estimates: 

V ,: p = {X'V-'X)-'X'V-'y, (5.14) 

where V = ZGZ' + R. Here, "best" refers to minimum mean-squared error of 
estimation. The estimates are equivalent to the MLEs derived from the marginal 
distribution of V . 

The best linear unbiased predictors (BLUPs) for u are given by 

u = GZ'V-^c, (5.15) 

where c = y — X/3 is the vector of generalized least squares or composite residuals, 
the latter term being coined by Hilden-Minton (1995). Here, "best" refers to 
minimum MSEP. 

Note that the BLUEs and BLUPs do not require normality- only the model 
and covariance structure specified in (5.12). Detailed derivations can be found in 
Searle (1971) and Searle et al. (1992). 

5.4.3 Inference by the Mixed Model Equations 

Henderson et al. (1959) derived estimates for the LMM by maximizing the 
joint log-likelihood of u and /3: 

(5.16) 
(x--[{y-Xp- Zu)' R-^ iy-X0~ Zu) + u'G-^u] . 

This likelihood is also known as an h-likelihood (Lee and Nelder 1996). The solutions 
that maximimize (5.16) are in fact the same as the BLUEs and BLUPs. The 
resulting estimating equations are called the mixed model equations (MMEs) and 



143 



are usually written 



V 



X'R^X X'R^Z 
Z'R^X Z'R-^Z + G-^ 

The corresponding set of estimating functions are 



X'R-'y 
Z'R-'y 



(5.17) 



h{u,l3;y) = 



X'R-\y-X0-Zu) 
Z'R-\y - X/3 - Zu) - G-^u 



(5.18) 



5.4.4 Inference by Profile Predictive Likelihood 

The BLUEs and BLUPs can also be derived by the profile predictive 
likelihood. Recall that inference using the profile predictive likelihood works as 
follows: 

1. Maximize the joint likelihood of u and ^ with respect to 0. Denote 
- the solution as /3(j/, u). 

2. Substitute ;9(2/,w) into the joint likelihood. The resulting expression, 
L*{u,^{y,u);y), is the profile predictive likelihood of «. 

3. Obtain a prediction for u by either of the following methods: 

a. Obtain the maximum likelihood predictor (MLP) by max- 
imimizing L*{u, ^{y, w); y) with respect to u. 
h. Obtain the conditional/predictive mean of u given (y', ^ (y, u)) 
horn fi{u\y,0{y,u))ccL*{u,0{y,u);y). 

4. Obtain point estimates of /3 by substituting the predictor of u from 
step three into ${y, u). 

By definition, the point inferences from steps 3a and 4 maximize the joint 
likelihood of u and (3. This implies that using the MLP from the profile predictive 
likelihood leads to the BLUEs and BLUPs. However, it is not as clear that the 
solutions from step 3b and 4 lead to the same solutions, and so the derivation 



144 

follows. The first step is to maximize the joint likelihood of u and with respect 
to /3 for given u. This leads to the first set of mixed model equations (5.17), which 
have solution ^{y,u) = {X' R~^ X)~^ X' R'^ {y - Zu). The next step is to show 
that predicting u with its conditional mean given the data and the estimate /3(j/, u) 
leads to the second set of mixed model equations. Starting with the profile predictive 
likelihood, we have 

L*{u,0{y,u);y) 

ocexp(-^{y- X^{y, u) - Zu^ ' R' (y - X^{y, u) - Zu) - lu'G-'u) 

^exp(-^y'R-'(^y-X0{y,u)-Zu) 

+ ^^\y,u)X'R-'(y-X0{y,u)-Zuj 

+ ^u'Z'R-' {y - X^{y, u) - Zu) - ^WG-'u] 

= exp (^ - ^y'R-' (y - X0{y, u) - Zu) 

+ ^u'Z'R-' (j/ - X^iy, u) - Zu) - lu'G-'u) , 

(5.19) 



145 

''o-l/ 



since X'R (y — X^{y, u) — Zu) = 0. Examining the exponent, we find 



-\(y'R-\y - XP{y, u) - Zu) - u'Z'R-\y - X^{y, u) - Zu) + 



u'G-^uj 



-Uy'R-'y - y'R~'X^{y, u) - 2u'Z'R-'y + u'Z'R-'XP{y, u) 
+ u'Z'R-^Zu + u'G'^u 

' .' D— 1 'v/' v' D— 1 ■v^-l "V' E>— 1( 



oc -- ( - y'R-'X{X'R-^X)-^X'R-\y - Zu) 

-2u'Z'R-^y + u'Z'R-^X{X'R-^X)-^X'R-\y-Zu) 
+ u'Z'R-^Zu + u'G~^u 

(x-U- 2u'{Z'R-^ - Z'R-\X{X'R-^X)-^X'R-^)y 

+ u'{-Z'R-^X{X'R-^X)-^X'R-^Z + Z'R^Z + G-^)u 

= -Uu'{Z'{I - R-^X{X'R-'X)-^X')R-^Z + G-^)u 
- 2u'{Z'{I - R-^X{X'R-^X)-^X'))R-^y 

= -i (u'{Z'P'R-^Z + G-^)u - 2u'Z'P'R-'y\ , 

(5.20) 
where P' = I - R-^X{X'R-^X)-^X'. It follows that 

f{u\y, ${y, u)) oc exp (^ - ^{u' {Z' P R-^ Z + G~'')u - 2u'Z'P'R-'y)\ (5.21) 

This is proportional to the kernel of a normal distribution with a mean of 

(ZP'R-'Z + G ') 'Z'P'R-'y, (5.22) 

and this predictive mean is the point prediction for it. 



146 

Meanwhile, substituting the solution from the first set of MMEs into the 
second set yields 

Z'R-^X{X'R-^X)-^X'R-^{y - Zu) + {Z'R-^Z + G-'^)u = Z'R-^y. (5.23) 

Solving for u provides the BLUP of the random effects, 

u = {Z'P'R-^Z + G-^)-^Z'P'R-^y, 

and this is identical to the predictive mean (5.22). Hence, predicting u with the 
predictive mean from the profile predictive likelihood yields point inferences for « 
and /3 that are identical to those from the mixed model equations. 

5.5 Local Influence Methods for Linear Mixed Models 
5.5.1 Assessment in the Marginal Model . ,_ 

Local influence techniques for linear mixed models have been developed using 
the marginal model (5.13). Beckman et al. (1987) considered a special case of the 
linear mixed model known as the variance components model. This model is used 
for data with a sources of grouping or clustering, where the i*'' source has gi levels. 
For Zi being of size n x g^ and Wj being of size 5', x 1, the model is given by 

y = X0 + ZiUi + Z2U2 + ...ZaUa + e. (5.24) 

This structure implies Z is a block-diagonal matrix with i"* block Zi, and u has a 

total of p2 = Yli=i 9i elements. In addition to the general LMM assumptions, the Wj 

vectors are mutually independent, with V(ui) = aflg. and V(c) = cr^J„. The model 

implies that the marginal variance of Y can be written as 

- ' - ■ "• 

V = ZGZ' + R=J^afZiZ[ + a^I. 
1=1 



147 

Beckman et al. (1987) assumed the fixed effects as well as the variance parameters, 
{a^,al, . . . ,(Tq), are jointly estimated by ML. Two perturbation schemes were 
introduced. The first scheme perturbs the z*'* residual variance to Wjcr^, while the 
second scheme introduces a perturbation to the variance of each random effect: the 
variance of the j*'' element of Mj becomes uijaf. While both of these perturbations 
technically perturb the model assumptions, the major goal is to assess the influence 
of individual observations and clusters, respectively. 

Lesaffre and Verbeke (1998) emphasized applications to longitudinal data. 
They considered the following linear mixed model: 

Vi = Xi/3 + ZiUi + Ei (5.25) 

where 

i = l,...g, 

yi is vector of nj observations, 

Xi is an nj X p matrix , 

Zi is a,n rii X t matrix, 

Ui is a vector of t random effects and 

Cj is vector of rii errors. 

It is assumed that V{Ui) = Gj, V{€i) = /„,, and that y^ and y^, are independent for 
i ^ i' . The model permits the inclusion of random slopes, and the matrix Gj can be 
as general as needed. Also note that the model has a total of p2 = gt random effects. 

In the context considered by Lesaffre and Verbeke (1998), the subscript i 
corresponds to the i*^ subject, from whom repeated measurements are taken over 
time. The model implies that the log-likelihood is the sum of contributions from each 
of the subjects, and by allowing each of the m weights to vary from 1, a subject-level 



148 

case weight was introduced as a perturbation. Because the perturbation does not 
correspond to a new density for the data, it would be considered a perturbation of 
the estimation technique. Diagnostics were primarily based on basic perturbations 
(i.e., one-at-a-time perturbations) of subject weights. Lesaffre and Verbeke (1998) 
showed that the basic curvatures could be written as the sum of several terms, and 
the easily-interpretable terms were used as a means to compare the relative influence 
of subjects. 



5.5.2 Assessment of Predictions 

The influence of perturbations on both the BLUEs and BLUPs can be 
assessed by applying the techniques developed in Chapter 4 to the predictive 
HkeUhood displacement (5.7). The estimating functions, h{0,y), are those used 
in the mixed model equations (5.18), and if the perturbed model is also a linear 
mixed model, the perturbed estimating functions, hi^{0;y), are also of this form. 
Local assessment of the predictive likelihood displacement requires the following 
quantities: 



e = 



13 
u 



i{e^) = e^ 



m = PLD{u,) = 2[PL{u; y, ^) - PL{u^; y, ^ J]. 

The next step is to determine whether first- or second-order local assessment is 
appropriate. Since the influence measure does not explicitly depend on a;, s^^ = 0. 
Also, by derivation of the mixed model equations. 



8^ 



dPLD{u}) 



djiej 



d 

30^ 



{-2PL{u^;y,0J) 



= -2 



dL{e;y) 

de 



= 0. 



149 



The implication is that the gradient vector vanishes. Proceeding with a second-order 
assessment, we note 



S = 



dwdui' |0 dwd-yiey 1° 
d'y{0)dw' l0 d'y{0)d'y(ey I" 




5 



77 



iif 



57(^<.) 



a©,. 



Using (4.32), the acceleration matrix is then 






/, JiiT' 



■*9 






(5.26) 



= -2A'h" 



.52L(0;j,) 



8080' 



h A 



-2A'h A. 



where h = 8h{u,/3;y)/d0'\o. The matrix h is derived as follows. Taking derivatives 
of the estimating functions with respect to we find: 



8h{u,/3;y) 
80' 



d_ 
80' 



X'R-\y-X/3-Zu) 
Z'R-\y -X^- Zu) - G'u 

X'R^X X'R^Z 
Z'R-^X Z'R^Z + G-^ 



(5.27) 



The expression is not a function 0, so (5.27) is in fact h. It can be shown by- 
inverting this matrix that the variance matrix of the BLUEs and BLUPs is (Searle 



et al. 1992) 



where 



-h 






150 



(5.28) 



-h^^ = {x'v-^xy^ 

-h?^ = -GZ'V'Xh^^ 

-/i22 = {Z'R-^Z + G-^)-i - h'^^X'V-'^ZG. 

Once a perturbation is chosen and the perturbed estimating functions are 
constructed, computation of the acceleration matrix (5.26) is straightforward. 

5.5.3 Assessment for Subsets of Point Inferences 

Influence measures for subsets of the perturbed point inferences can also be 
developed. If only the fixed effects are of interest, then assessment can be done 
using the marginal model (Beckman et al. 1987; Lesaffre and Verbeke 1998). More 
generally, a profile predictive likelihood displacement can be defined to assess the 
change in a subset of fixed and random effect point inferences, 6^ = {^i,u[): 

PLD^''\^) = 2[PL{u-y,P) - PL{u,^,gu{K)-yA.,9p{0i.))l (5.29) 

where ^j,^ = Cfii^,u\J comes from /9^ = {^[^,^'2^' and u^ = (wio/'^L)' and 
i9'j3{^ii^),9'u{0icj)) maximimizes PL{ui^,U2;y,^i^,02) with respect to {/3'2,u'2). For 
linear mixed models, the acceleration matrix follows from the arguments for deriving 
the acceleration matrix of LD^^^^{u}) in Section 4.6.4, and is given by 



M = -2A'{h - 





-1 



h 



22 



)A, 



151 



where /122 comes from the partition 



h 



hii hi2 
h2\ /1-22 



When influence on a linear combination, x'^ + z'ii, is of interest, the results of 
Section 2.7.1 can be used. Letting x*' = {x',z'), it follows from the argument of 
Theorem 2.1 that . . 



«f 



Cm^ = ^||A'/l"'||2 t;:nax OC A'h'x*, (5.30) 



where ki = h x*. 



5.6 One-way Random Effects Model 

The one-way random effects model can be used to provide insight into the 
diagnostic value of locally assessing the linear mixed model. Some preliminary 
results for this special case of the LMM are presented here, so as not to clutter later 
explanations. 

The (unbalanced) one-way random effects model is 

Vij - fi + Ui + eij i^l,...g j = 1,... m, (5.31) 

with V(itj) = a^ and V(eij) = a^. It is a special case of the variance components 
model (5.24) with a = 1, X = 1„, and /3 = jx, and is also a special case of the 
two-level model given in (5.25). If all n,- are equal to n/g, the design is said to be 
balanced. It can be shown that 

^= y-^ 1 - a^d u,= -—^c, (5.32) 

where y, and Cj are the average response and average composite residual, respectively, 
for the z*'* cluster (Searle et al. 1992). 



The building blocks for deriving h are as follows: 



X = lr 



z = 



•■m 



R = a^In G = allg 

X'X = n Z'Z = Dim) 

X'Z = n' = {ni,n2,...ng). , ' 



ln„ 



152 



It follows that 



-h 



X'R^X X'R^Z 
Z'R-^X Z'R^Z + G^ 



iTl 



rr*- 



^n ^Dini) + ^Ig 



n n 

n D{ 



a^+n,aj 



(5.33) 



The following results for a partitioned matrix with a scaler upper-left sub-matrix 
can be used to invert the matrix (Searle 1982): 



A B 
C D 



1 -BD-^ 



+ 




D-^ 



(5.34) 



153 



where c = A- BD ^C. Applying this result, we find 



-h 



a 
~k 



1 



1 1 



'^i+ 



c ' ni 



<^c^ + 



^i + z 



atM{n„nj) + kD{^f^^) 



(5.35) 



where 



M{ni,nj) 



1 1 



and 



y 9 y 9 ■ 

■^a^ + ^Vn, -^a^ + aVni 

2 

When the design is balanced, A; = ^-2/ +g2/n ) and the following simplifications can be 
made: 



-/* = (^c/^ + <^V^) 



1 



al/g + ayn 



-^1 

5 » 



-'r'r/9 


—V 

In^g 




11 \ 2 


1 /^ ''c/n T 
9 "^ '^allg+cj^/n^g 


allg+ayn) ^9X 


7" -^K 

' g 9 


"l 


'^^9^9 + n -'^S 




o-?/S+o'^/n 



.<- * 



-r: "^ 



Of particular interest are the first and second columns of — /i 



-1 



—h di = -p 



a 
k 



(T^+CT^/m 



cr^+a^/rig 



-h d2 



a'al 



k{a^,+ayn,) 



-1 



+ 



cf+a^Jni ni 



(t| +0-2/712 



c^+a'^/ng 



154 



(5.36) 



(5.37) 



5.7 Residual Variance Perturbation Scheme 
In this section, a residual variance perturbation is proposed. The perturbation 
is applicable to the general form of the linear mixed model. Simplifications when 
R = a^I are presented, and some closed-form expressions for the one-way model are 
given. 



5.7.1 Form of the Perturbation 

Let aij denote the if^ element of the matrix R. A set of n perturbations is 
introduced into the model such that the perturbed residual covariance matrix is 



Rw — 





ai2 
V'a)iW2 


X/W1W3 


y/(jJltjJ2 


"it 

W2 




013 






(Tin 


0-2n 


03„ 






■s/uliiiln 
0^3n 



\/OJlCO„ y/cJ2Uln s/UJ3U]n 



The intent of this scheme is to weight the contribution of individual observations 
to inference in situations when they do not necessarilly enter the likelihood in an 



155 

additive fashion. The perturbation changes the variances but leaves the correlations 
among the error terms unchanged. The resulting local influence diagnostics can 
detect influential observations without upsetting the relationships assumed to exist 
among the residual errors. 

It simplifies to the error variance perturbation given for regression when 
R = a^I. 

5.7.2 Computational Building Blocks - 

To derive A = dK{0]y)/duj'\o, define e = y-X^-Zu, and X* = [X Z]. 
We begin with the perturbed joint likelihood. Ignoring terms that do not depend on 
both and u? we have 

L^{u,0;y) = log(/(y | 0,u)g{u | 0)) 

^-l[{y-x*erRzHy-x*e)] 

= -^ [y'R-Jy - 20'X*'R-'y + e'X*'Rz'X*0] . 

Taking derivatives with respect to 0, we find 

dL^{u,0;y) 



Kie;y)= ,, 

= ~ [-2X*'R-'y + 2X*'RZ'X*0] 
= X*'Rz'{y-X*0). 
Second partial derivatives are given by 



156 



Evaluating this at the original model we obtain the i*'* column of A: 



*/ D-l 



Ai = X*'R 









2<7li 2*^21 








\aii ... 






)^(i+l)i 



2^ni 



... 



R-^e. 



(5.38) 



When H = a"^!, R^ = a'^D{l/u}), and the expression above simplifies, allowing us 
to write 

A = ^X*'D{e) 



and 



M 



-2A'L ^A = ^D(e) [X Z] 



T -1 r 



X'X X'Z 

Z'X Z'Z + a^G-^ 



X' 

z' 



D{e). 



5.7.3 One-way Random Effects Model 

Deriving results for the one-way random effects model in more detail should 
provide information about the pertubation's diagnostic value. First, consider 
influence on the intercept parameter estimate, /i. This can be accomplished by 
using either m = fi^ or m — PLD^^\u}) as the influence function. The direction of 



maximum local influence for both measures is given by 



157 



Vmax cx: A'/l di 



ex D{e) [X Z] 



(T^+a^/m 



t2-I-/t2/ 



<T|+<T2/ng 



oc£>(e) 



■•ni 



(x£>(e) 



(•'• (72+o-'2/ni)-'-"i 






oc 



^Vm 


ei 


CT2+0-2/ni 


aVng 


Cfl _ 


a2+^2/"3 



cr| +0-2/711 



(5.39) 



where Cj denotes the residuals for cluster i. Thus, the direction of largest local 
influence is proportional to the residuals, but with each cluster scaled according to 
its size. The coefficients upweight observations in smaller clusters and downweight 
observations in larger clusters. The reason for this is that ft is not a simple average 
of the observations: the numerator of /t in (5.32) can be written as 

9 ni 



EE- 



Vij 



rual + <t2 

1=1 ]=i ^ 



This indicates that observations from smaller clusters receive more weight. Note 
that if either the design is balanced or al — 0, then Umax oc e. 



158 

We now consider influence on the predicted random effects. Without loss of 
generality, let us examine influence on Ui . The direction of largest local influence is 
given by 



Vr, 



DC A'h di 



oc D{e) [X Z\ 



+ 



(T^+iT^/ni ni 



al+a"^ In2 



al+a'^lng 



Die) 



>-ni 



+ 



o-^+cr^/ni ni 



Die) 



-1 + 



+ 



-1 + 



-1 + 



o-^+iT^/ni ni 
^2 



a'l+a'^ In2 



of+c^Jng 



_k o-^/ni 

<T2+aVn2^2 



)ei 









cj+a^/m 



t2-I-/i2 / 



''l+aVns 



(5.40) 



159 

Although the individual cluster sizes play into the weights, it is clear from inspection 
that the residuals from the first cluster have weights roughly g — 1 times greater 
than the others. In the balanced case, all rii — n/g, and this implies 

y^ aVm _ {g- l)crV" 



^ al + crVn,- a'^/n + al/g ' 



It follows that 



^max Ot 



1 
■9-1 



62 



s-i^s J 



Thus, in the balanced case, the residuals of the first cluster have weights that are 
g —\ times larger than the others. In fact, the BLUP of U\ is given by 



'ylh 



o'^jn + allg 



5-1 



1 ^ 

)z/i - - Xl y^ 



1=2 



Not surprisingly, maximum influence on Ui is achieved primarily by perturbing the 
variance of observations in cluster i. 



5.8 Cluster Variance Perturbation Schemes 
In this section, two cluster-based perturbation schemes are considered for 
assessing how clusters of observations influence estimation and prediction. 

5.8.1 A Cluster-based Case- weight Perturbation 

For simplicity, we consider the model given in Lesaff're and Verbeke (1998), 
(5.25), but with the diagonality-constraint for Ri dropped. To perturb the model, 
we let Gi become ^Gj and Ri become ^ilj for i = \,. . .g. The perturbation of 
both matrices is important: it preserves the correlations among the random effects 



160 

as well the correlations among observations in the same cluster. Hence, the only 
effect of the correlation is to change the relative weighting of the clusters, allowing 
us to assess the influence of clusters of observations. 



5.8.2 Computational Building Blocks 

To derive A we begin with the perturbed joint likelihood: 

K{u,/3;y) - log(/(2/ | /3,u)g{u | 13)) 

a - ^ [(y - X*9)' R-Jiy- X*9) + u'Cju] 

= -\ [y'K'y - 2e'x*'Rz'y + 9'x*'Rz'x*e + u'gz'u] 

Partial derivatives with respect to the fixed effects /3 are: 

^^■^^'^^ = -\ [-2X'R'^'y + 2X'R-JXP + 2X'Rz'Zu\ 

= X'Rz\y-Xfi-Zu) 

= Y,u,X[RT\y,- Xip - ZiU,). 
1=1 



Partial derivatives with respect to w are 



du 



= Z'R-\y, - Xi/3 - ZiUi) - G-'u 

9 

= J^'^iiZ'.RT^y, - Xi^ - ZiUi) - Q.'ui). 



«=i 



The perturbed estimating functions are then 



K{e\y) = 



Y:U^iX\R.\y,- X,0 - Z,Ui) 
ELi uJi{Z\R.\y, - XS - ZiUi) - Cr'ui) 



(5.41) 



161 



It follows that the z*'' column of A is 



dK{e-y) 



dcoi 



X\RT\y^-X,l3-ZiU,) 






X\R^'ei 



(5.42) 



Z\Rr'ei - G-'ui 

This expression can be further simplified. Because the matrices Z, R, and G are all 
block diagonal, the BLUPs for the i*'' cluster can be written as 



Ui = {Z[Rr^Zi + Gr')-'Z[R7'c, 



(5.43) 



where Cj is the vector of composite residuals for the i*'* cluster. Substituting this 
into the bottom portion of (5.42), we find 

Z'iR;^ei~Gr^Ui = Z'iR;\ci-ZiUi)-G;^Ui 

=^ZrR7^Ci-iZ[Rr'Zi + Gr')u, 

= Z[RT'ci - {Z'.R-r'Zi + Gj'){Z\R-'Zi + G-'Y' Z[R-^ a 

= 0. 



This implies that only the top pi rows of A are non-zero. The matrix can be written 
compactly as 



A = 



X'R-^D{ei) 
0„ 



'flXff 



(5.44) 



where D(ei) is an n x m block-diagonal matrix whose i*'* block is ej. 



162 

The formula has several implications. First, the acceleration matrix for the 
influence measure PLD{ui) can be written as 



M = -2A'/i A 

- -2 D{ei)'R-^X 



h'' 


/."" 




h'' 


h" 





X'R-^D{ei) 




(5.45) 



^-2D{ei)'R-^X{h}^)X'R-^D{ei) 

= 2D{ei)'R-^X{X'V-^X)-'X'R-^D{ei). 

This matrix has at most pi distinct non-zero eigenvalues regardless of the number of 
random effects. In addition, using the results given in Section 5.5.3, the acceleration 
matrix for influence measure PLD^^^{lo) is equivalent to (5.45). Finally, the 
acceleration matrix for PLD^^^u:) is 



M 



Diei)'R-^X 


ih-'- 


Kl 


) 


X'R-^D{ei) 


•- -J 











(5.46) 



= 2D{ei)'R-^X{{X'V-^X)-^ - {X'R-^X)-^)X'R-^D{ei). 












/f ? 



163 



5.8.3 One-way Random Effects Model 

Using (5.30), Vmax for directed influence on a linear combination of the 
intercept and random effect point inferences is given by 



Umax oc A'^ X* a 



oc 






h X* 



,11 



12 



h'' h" 



X 







gxg 



h'' h 



Eni 

Eni 

Using the relationship between Cj and Uj given in (5.32), 






X 



12 



X 



OC 



»» 






"i^i — '^i 



cTg + a^/ui „ 






(5.47) 



(5.48) 



This implies that the direction of maximum curvature is proportional to m, the 
vector of BLUPs. Thus, we see that the form of A has an interesting consequence 
when the intercept is the only fixed effect in the model: no matter what linear 
combination is chosen, the direction of maximum curvature is the same. This 
anomaly does not occur when ^ has more than element. 



164 

5.8.4 An Alternative Cluster Variance Perturbation Scheme 

In this section, the variance perturbation scheme proposed by Beckman et al. 
(1987) is addressed. For the one-way random effects model, the perturbation is 
V(ui) = Wjcr^. As each uji approaches infinity, V(«i) — > oo and V{yij) — > oo. Clusters 
perturbed to have higher random effects variance are deemed to provide less reliable 
information for inference. Also, the correlation between observations in the same 
cluster approaches 1 as Wi — > oo: 



Thus, observations from clusters that are perturbed to have higher random effects 
variance are deemed more alike, while observations from clusters perturbed to have 
smaller random effects variance are deemed less alike. This is in contrast to the 
perturbation given previously, which preserves the intra-cluster correlations. Also, 
the alternate perturbation does not reweight the clusters' entire contributions to the 
likelihood. Since it does not correspond to a cluster weight, it provides diagnostic 
information about the assumption of equal cluster variance rather than the influence 
of clusters. 

• By using Y{ui) = ^a^ rather than Y{ui) — ujiol, we can obtain Umax for 
this perturbation from results for the first cluster perturbation. Using the partial 
derivatives from Section 5.8.3, we may derive 



A = 





-G-^D{u) 



165 



The acceleration matrix is then 



M = -2A'h A 



= -2 



r 






[/." h''] 










-£)(m)'G-^ 




A" A" 




-G-^£>(m) 



22 



= -2D{u)'G-'h G-^D{u) 

= 2D(u)'G-^ ({Z'R-^Z + Q-Y' 



(5.49) 



+ GZ'V-^X{X'V-^X)-^XV-^ZG] G'^D{u 






This matrix has at most p2 distinct non-zero eigenvalues regardless of the number of 
fixed effects. In addition, using the results given in Section 5.5.3, the acceleration 
matrix for influence measure PLZ)["](a;) is equivalent to (5.49). Finally, the 
acceleration matrix for PLD^^^ (a;) is 



M 



i/-,-i 



-D{u)'G 





■ -1 




, • -1 


hn 




[h - 





) 







G-^D{u) 



(5.50) 



= 2D{u)'Z'V-^X{X'V-^X)-^XV-^ZD{u). 

As special case, consider the one way random eff'ects model. Using Theorem 
2.1, the direction of maximum curvature for fi is 

Umax OC A'/l dy 



a 



-D{u) 



a-^+a'^/m 



a^+CT^/Ug _ 



(5.51) 



(Tl+aVm "i 



<T|+aVng% 



166 



Here, the elements corresponding to larger clusters receive more weight. This is in 
contrast to the first cluster perturbation scheme, for which Umax oc w. 
The direction of maximum curvature for Ui is given by 



-1 



Umax CX: A'/l d2 



oc 



-D{u) 



^2 



-1 



+ 






a^+a^/Ug 



(5.52) 



Vff2+'^2/„j + „j)M1 
2 

<72+a2/n2"2 

where k was defined after (5.35). The coeflScient for the first element is approximately 
1 + g/rii times larger than the others, indicating that influence on the i*'' BLUP is 
accomplished largely by perturbing the variance of Ui. The direction of maximum 
curvature for jj, + Ui is even more dominated by the first element: 



Umax cx: A'/l 



1 
1 




a 



n. ' 



2 a^+a'^/m'^^ 
1 



(7| +0-2/712 



M2 



U„ 



al+c^ln, "S 



(5.53) 



Here, the first element is approximately g — \ times larger than the other elements. 



167 

5.9 Application to Corn Crop Data 
In this section, the residual variance perturbation and cluster variance 
perturbation schemes are applied to an agricultural data set. After briefly reviewing 
the data set and goals of the analysis, local influence techniques are applied. 
Graphical results indicate that the data set has an influential observation. 

5.9.1 Data 

Battese et al. (1988) analyzed reported areas of corn crops from 12 Iowa 
counties using a variance components model. Keeping the same notation as in 
Battese et al. (1988), the model is 



Vij 
where 



— PO + Ui + Pcorn^Uj + Psoy^2ij + ^ijj 



xjij = reported hectares of corn in segment j from county i 
xi^. = no. of pixels of corn in segment j from county i as per satellite imagery 
X2ij = no. of pixels of soy in segment j from county i as per satellite imagery 
txi~ i.i.d. N{0,al) 

Cij ~ i.i.d. N{0,a^) ■. 

1 = 1,... 12 
j = l,...ni 
{nj = (1,1, 1,2, 3, 3, 3, 3, 4, 5, 5, 6) 
^ni=37. 

Of particular note in the data set is observation 33. The reported hectares 
for this case are exactly the same as those of the previous observation despite 
having vastly diflferent covariate values. Battese et al. (1988) felt that the value was 
misrecorded, and did not include it in their analysis. 









Appendix B contains model-fitting information from an analysis in which the 
the observation is included. Note that the regressors are also centered and scaled 
for this analysis. Additionally, the variance components have been estimated by 
restricted maximum likelihood (REML). By not perturbing the REML estimating 
functions, the local influence techniques described in this chapter can be used to 
assess the effects of perturbations on ^ and u while effectively treating the variance 
components as known. 

The goal of the analysis is to first establish a linear relationship between 
the satellite data, which is readily available, and the actual reported data, which 
is accurate but not available for all segments in all counties. The resulting model 
can then be used to make predictions for segments in which reported hectares are 
unavailable. 

Interest centers on predicting the population mean hectares of corn per 
segment. Because the satellite data are available for all segments in each of the 12 
counties, the population mean number of pixels for corn and soy in county i is known. 
These means are denoted xu^^^ and ^2i(p) and respectively. The corresponding 
population mean hectares of corn per segment in county i is then defined as 

Point inferences for these values can be made using the the corresponding BLUPs: 

/^i = Po + ^i + Pcorn^lii^p) + Psoy^2i(p) • . 

5.9.2 Residual Analysis 

Figure 5.1 contains diagnostic plots of the corn data. The first plot is of the 
37 composite residuals c = y - X^. The three single-observation counties appear 
first, followed by the county with two obervations, etc. Cases from the same county 
(cluster) are joined by a line, a practice suggested by Hilden-Minton (1995). This 



169 

plot indicates that observation 33 is the most outlying from the overall estimated 
regression curve, X^. Figure 5.1(b) shows the residuals e = y - X$ - Zu. This 
shows that even when taking into account the county effects, observation 33 is 
the most outlying. Figure 5.1(c) shows both sets of residuals together, with the 
composite residuals represented by stars, and the residuals represented by triangles. 
We note that the largest difference among them is for counties 5 and 11, where the 
composite residuals are nearly all smaller. Not surprisingly, counties 5 and 11 have 
the largest BLUPs in Figure 5.1(d). Also, because the residual for case 33 is large 
and a different sign than the other residuals for county 12, it appears that Uu would 
be considerably larger if not for observation 33. A local influence analysis can assess 
this possibility more thoroughly. 

5.9.3 Scheme 1: Residual Variance Perturbation 

Figure 5.2 contains local influence diagnostics for perturbing the error 
variance in the corn model. The first plot shows the basic curvatures, Cb, that 
result from one-at-a-time perturbations. Observation 33 is considerably more 
influential than the others owing to its noticeably large curvature. The element 
for case 33 dominates the direction of largest curvature (Figure 5.2(b)). If all 
elements were equally contributing, then they would equal either - = ^ = 164 or 
—.164. Therefore, the value for element 33 is over five times larger than the value 
of equally-contributing-elements (e.c.e.). Plots (c), (d), and (e) are the directions of 
maximum curvature for directed influence on the intercept, the CORN slope and the 
SOY slope respectively. The plots indicate that case 33 is a dominate contributor 
to the SOY slope in particular. Plot (f) displays the largest curvatures for directing 
influence on the 12 predicted random effects. This plot shows which BLUPs are 
most sensitive to perturbation. Here we see that the county 12's BLUP is the most 
sensitive to the case-weights. This is the county containing the questionable data 
point. 



170 




-20 



-40 












^5 10"; 115 li 20 ;'>ii 



-^^ 



Ai 



(a) Composite residuals 



(b) Residuals 




10 

7.5 

5 

2.5 



-2.5 
-5 

■7.5 
-10 



2 4 

t 



6 8 10 K 



(c) Residujils and composite residu- 
als 



(d) County BLUPs 



Figure 5.1: Corn data; diagnostic plots 



171 



3 












\ ♦■. 



5 10 15 20 25 30 35 



1 

0.75 

0.5 

0.25 

•0.25 

•0.5 

-0.75 

•1 






■ *..-.« ' 



(a) Basic curvatures 



(b)^„ 



1 
0.75 

0.5 
0.25 



-0.25 

■0.5 

•0.75 

-1 



if 



* * f A i ^* ' '■ 



,i ,'54 f^U ',; iJ ; ?0» * J^ * 30 3^.. 



1 

0.75 

0.5 

0.25 



•0.25 
-0.5 

-0.75 
■1 






i^ 



Aa '? V ^0 *i 15^* ^0 ^^'i^ ?D 






(c) Vmax for /3o 



(d) I'max for /Scorn 



1 

0.7S 
O.S 

0.25 



-0.25 

-0.5 

-0.75 

•1 






35 



1.5 

1.25 

1 

0.75 

0.5 
0.25 



2 4 6 



10 12 



(e) Vmax for 0s 



(f) C'max's for each of the Uj 



Figure 5.2: Corn data; Scheme 1; local influence diagnostics 



172 

Figure 5.3 shows the directional vectors Vmax for directed influence on the 
individual BLUPs. Typically, it would be unnecessary to examine all of these plots, 
especially if there are a large number of random effects in the model. Nonetheless, 
all plots have been included here to more fully demonstrate their behavior. Upon 
inspection, maximum perturbation of Ui is generally achieved by perturbing cases in 
cluster i. This is not surprising given the closed- form results for the one-way random 
effects model in Section 5.8.3. Only in plots (a) and (g) do we see an observation 
in another county exerting influence. In both cases, observation 33 is the influential 
out-of-cluster data point. 

Figure 5.4 contains local influence diagnostics associated with point inferences 
for linear combinations of fixed and random effects. Recall that in Chapter 2, 
MaxFITS was proposed as a measure of how sensitive fitted values are to residual 
variance perturbation, while BasicFITS was proposed as a measure of how much a 
single observation determines its fitted value. These diagnostics are Cmax and C(, 
values for M , respectively. Figure 5.4(a) shows MaxFITS (stars) and BasicFITS 
(triangles). The values for MaxFITS indicate that the prediction of cases 33 
and 34 are noticeably more sensitive to pertubation than the other predictions. 
The sensitivity of prediction 33 is largely a function of perturbing this case, as 
demonstrated by the fact that its BasicFITS value is nearly as large as its MaxFITS 
value. This is confirmed by Umax for directed infiuence on the fitted value of case 33 
- element 33 is by far the largest contributor (plot (c)). However, the sensitivity of 
case 34's prediction is not largely a functon of perturbing case 34, as demonstrated 
by a moderately sized BasicFITS value. The source of the influence is in fact case 33, 
as is evident in the plot of Umax for directed influence on the fitted value of case 34, 
(plot (d)). In short, the fitted values for observations 33 and 34 are both sensitive 
because they are greatly influenced by the model's assumption that observation 33 
is just as reliable as the rest of the data. ji j^ _ V •* 

# ■■ ' ■ ■ ■. 



173 



0.15 
O.S 



'i '^■■1 yi'"' 



' . ■*< **« , " ■ t"".i*J * « JU * A ^ 



1 

0.75 
0.5 
0.15 

'0.25 

•M 

•<.7I 
-I 






• ■ ■ ; ' '"ft"' '' 5 "' 'iy 'i t ' " ^o" '^ i V 



(a) Umax for Ml 



(b) Umax for U2 



(c) Umax for U3 



■0.75 
•1 



•'I ■ ; *"' • ' ■ *' . ,^"V, 



■' l i ^' ^io ' / ' B^-V-^y^ViV 



I 

0.75 



-0.25 

■■<'■'■ 
•0.75 
-1 



./^ . 



' /"^ ' •'r iy"^ 5' » 



1 

0.75 
0.5 
0.15 



■0.25 
-0.5 
-0.75 



» 5" . if ' 15 "ar ^ • 30 » J^ 



(d) Umax for U4 > ,-,' ,;, . ^. (e) Umax for U5 



(f) Umax for U6 



1 

0.75 

0.5 
0.25 



-0.25 

-0.5 

-0.75 

-1 



1 



.' 'i'.'lt 



t 



t ■ ii 
•.» 

•.s 

0.25 



/"'^" •' y.^ ' ij ^ 



-0.25 
-(.5 

•t.n 
•1 



' ■ "^MyVj ' ^ ' •V^'i ^ 'V. 






^5 10 "ft** 



) ■ ■ ;» • ./ • ft 



(g) Umax for U7 



(h) Umax for Us 



(i) Umax for Ug 



0.75 
0.5 
0.25 



-0.25 
-0.5 
-0.75 






'•'sYV'^ft >;;j5* 30 V^. 



1 
•.» 

O.S 

0.25 



-0.25 
-I, 
-I.TS 
■1 



', »,fti^V..j 



.^svV'if •*'* 



30 ' ij'f » 



1 

0.75 
0.5 
0.25 



-0.25 
-0.5 

-0.75 
-1 



" ■';vV'y'tf ' "ft ' 'Tr ij^ 



(j) Umax for Wio 



(k) Umax for Uu 



(1) Umax for U12 



Figure 5.3: Corn data; Scheme 1; directed influence on individual BLUPs 



174 

Of course, the main goal of the analysis is to obtain predictions for the 
mean hectares per segment in each of the 12 counties. To determine whether these 
predictions are sensitive to Scheme 1, a variation of BasicFITS and MaxFITS is 
used. First, the 12 Cmax values for directed influence on county mean predictions 
are obtained. These are used to assess the relative sensitivity of the predictions. 
Next, the 37 basic curvatures, Cj,, are calculated for individual segments as directed 
upon the prediction for the corresponding county. This allows us to examine how 
much one-at-a-time perturbations within a cluster affect the prediction of that 
cluster's predicted mean. Finally, the 12 Cmax values and the 37 C(, values are 
plotted together (Figure 5.4(b)). Here the Cb values are plotted against case indices, 
and the Cmax values are plotted against the case indices of observations in the 
corresponding county. (Plot (b) is essentially a BasicFITS/MaxFITS plot when only 
one prediction is of interest in each cluster.) The implications for the corn analysis 
are two-fold. First, the prediction for county 12 is clearly the most sensitive, and 
this is due largely to the effects of perturbing case 33. Second, the influence that 
case 33 showed on Psoy, ui, and 1x7 (Figures 5.2e, 5.3a, 5.3g) does not translate into 
noticeable influence on the overall predicted means in the other 11 counties. 

5.9.4 Scheme 2: Cluster-based Variance Perturbation 

Having seen the influence of perturbing case 33's variance, we may want to 
assess the eff'ects perturbing the variance associated with county 12. To this end, we 
implement the cluster-based pertubation scheme suggested in Section 5.8.3. Since 
there are only 12 perturbations, we implement the strategy mentioned in Section 
2.6 - including the benchmarks ±-j= = ±.577 and ±-7= = ±-866 as horizontal 
lines. This should prevent us from confusing somewhat large elements with truely 
prominent ones. Also, directional vectors are rendered as star plots as well as index 
plots. Recall that a star plot is merely an alternative representation that suppresses 
the signs of the elements, but does not give the illusion of ordering that index plots 



175 



u 
u 
< 







■"^ 1 A 


^^v 







5 ID 15 2D 25 30 35 

Case 




(a) BasicFITS and MaxFITS 



(b) Cmax and Cb for predicted 
mean hectares per segment 



1 

0.75 
0.5 

0.25 



•0.25 

■0.5 

■0.75 

•1 



1 1 



* 

* 



> ii 44 ..t A j. ' .A y 



3^ i35» 



1 

0.75 

0.5 

0.25 



■0.25 

•0.5 

-0.75 

•1 



* ;5* » "l? "so 25 30 ;ii 



***i^ 



i 



> fc 



(c) Vmax for J/ss 



(d) Umax for ^34 



Figure 5.4: Corn data; Scheme 1; directed influence on predictions 



176 

do. Each element is displayed as a point, with the distance from the origin equal to 
its absolute value. The plot is completed by labelling the points with indices and 
drawing rings to represent the benchmarks ~ and -p=. 

Figure 5.5 contains index plots and a star plots of the three distinct 
eigenvectors resulting from the cluster perturbation. Element 12 of Umax is almost 
three times the value of equally contributing elements (plots (a) and (b)). Element 
11 of ^2 is nearly two times that of e.c.e. (plots (c) and (d)), while element 8 of U3 
is about two times that of e.c.e. (plots (e) and (f)). Nonetheless, the second two 
eigenvectors tend to be more noise than signal. 

To explore further how cluster 12 is affecting inference. Figure 5.6 gives 
directions of maximum curvature for directed influence on each of the individual 
BLUEs under the original parameterization. Element 12, together with element 11, 
appears prominantly in u^ax for both of the slope estimates. This indicates that the 
slopes, rather than the intercept, are being influenced by county 12. 

To assess eff'ects of the perturbation on the set of «,, Figure 5.7 gives the 

•• [ul 

three eigenvectors of M . The direction of largest curvature shows that element 
11 is more than two times that of equally contributing elements, while element 12 
is roughly three times that of e.c.e. in the second eigenvector. Finally, element 8 
is most prominent in the third eigenvector. Because element 12 stands out in V2, 
but not Umax, thcsc plots are inconclusive as to whether county 12 is influential as a 
whole on the BLUPs. 

Finally, we address the main question of interest- whether or not the 
predicted mean hectares per segment are sensitive to the perturbation. Since there 
is only one prediction per county, we can use the perturbation scheme to produce 
cluster-based analogs of BasicFITS and MaxFITS (Figure 5.8). The MaxFITS values 
show that all of the predicted means show comparable sensitivity to perturbation. 



177 



Ir 

0.75 
0.5 

o.is 



• , • ' 

2 4 • i * 



10 12 



-ft.35 

-0.5 

-0.75 

-1 



(a) v„ 




-lii 



(b) IvmaxI Star-plot 



Ir 
0.75 

0.5 

0.25 



2 , * 



6 . 8 



-0.25 

-0.5 

•0.75 

•1 



10 12 



(C)U2 




(d) \v2\ star-plot 



It 

0.75 

0.5 

«.2S 

■0.25 

■0.5 

-0.75 

-1 



* 2 4 6 8 S U 



(e)t;3 



^5* 




(f) l^sl star-plot 



Figure 5.5: Corn data; Scheme 2; local influence on 



178 



X 












0.75 










0.5 




• 






0.25 






t 


• 






• 


• 


• 










2 


4 6 8 




10 12 


•0.25 






• • 

• 




• 


-0.5 












0.75 












■1 
















(a) Wmax for $0 






1 












0.75 








• 


0.5 








■,., ' 


t.2S 






., , , 
. . • . 








• 


i 


1 6 8 


• 


V 12 


0.25 












-0.5 












0.75 








• 


-1 












(b) I Umax I for ySo star-plot 




(c) Vraax for ^corn 



(d) It'maxI for /Scorn Stax-pfot 



1 






0.75 




0.5 


• 


0.25 


• 


. . • . 






2 4 « • 8 ' 10 12 


0.25 






-0.5 


•■ 




0.75 




-1 




(e) Vmax for 4«o» 






-412 ■■ 



(f) l^maxl for ^,ov Star-plot 



Figure 5.6: Corn data; Scheme 2; Umax's for individual BLUEs 



179 

The small BasicFITS values indicate that none of the counties is self-predicting: not 
even county 12. 

5.9.5 Scheme 3: Alternate Cluster-based Variance Perturbation 

The alternate cluster variance scheme can also be applied to the data set. The 
acceleration matrix of the likelihood displacement has twelve non-zero eigenvalues. 
The first two eigenvalues appear in Figure 5.9 as index plots and star plots, and both 
vectors implicate elements 5 and 11 as being influential. The other ten eigenvectors 
are dominated by a single unique element, and are not shown. 

Figure 5.10 shows the directions of maximum curvature for directed influence 
on each of the individual BLUEs. Although a couple of elements exceed the 2/y/\2 
benchmark, the plots are relatively noisy. The small value for element twelve in all 
of the plots suggests that perturbing county 12's random effect variance has little 
effect on the fixed effects estimates. 

Finally, Figure 5.11 provides the BasicFITS/MaxFITS plot for the alternate 
cluster perturbation. The first thing we notice about the plot is that the BasicFITS 
values are nearly as large as the MaxFITs values. This suggests that each /ij is most 
sensitive to the perturbation of V{ui). We also recall that for the simple random 
sample, the i^^ element of Umax for /tj was roughly g —1 times larger than the other 
elements. Hence, Figure 5.11 suggests that this dominance may be maintained 
in more complicated models. If this is true, it may be sufficient to only plot the 
MaxFITS values. For the corn data, the MaxFITS values indicate that the predicted 
means for counties with large BLUPs (counties 5 and 11) are most sensitive to the 
perturbation of the cluster variance. 

5.9.6 Conclusions and Action 

Having examined these plots, the analyst should be very wary of observation 
33. Scheme I indicates that if its precision varies from that of the other segments 



180 



Ir 

0.75 

0.5 

0.25 



• • 



•0.25 
-0.5 

-0.75 
-1 



6 . 8 



(a) Vr, 



10 



12 




(b) Ivmaxi Star-plot 



Ir 

0.75 

0.5 

0.25 



! f » 



10 12 



-0.25 

-0.5 

-0.75 

-1 



(c) V2 



y '^ 



Ir 

0.75 

0.5 

0.35 



2 4 6 



10 » 



-0.25 
-0.5 

-0.75 
-1 



(e) V3 




(d) |t;2| stax-plot 




(f) l^sl star-plot 



-*12. 



Figure 5.7: Corn data; Scheme 2; local influence on u 



181 




Figure 5.8: Corn data; Scheme 2; local influence on predicted means-cluster-based 
BasicFITS and MaxFITS 



1 

0.75 

o.s 

0.2S 



-0.25 

-0.5 

-0.75 

-1 



» t 



6 • 8 



ir^ 




(a) Vr, 



(b) I f max I star-plot 



1 

0.75 
0.5 



-0.25 
•8.5 

-0.75 
-1 



■^-T 



« » 



! ? • 10 12 




(C) V2 



(d) \v2\ star-plot 



Figure 5.9: Corn data; Scheme 3; local influence on 



182 

in county 12, the county's predicted mean hectares of corn per segment is 
affected (Figure 5.4(b)). Using the weighted county perturbation scheme leads 
to inconclusive evidence about county 12's overall influence. Although element 
12 figures prominantly in the Vmax's for the two slope estimates (Figure 5.6), 
even the 12*'* county's predicted mean appear unaffected (Figure 5.8). Influence 
diagnostics for perturbing cluster variances (Figures 5.9, 5.10, and 5.11) suggest 
that the assumption of equal variance for the 12"* random effect is not influential. 
Therefore, it is deemed that differential weighting of observation 33 is the main 
cause of sensitivity, and since the value was considered suspect a-priori, deletion or 
substantial down-weighting of this single observation seems warranted. 







- - 




^'.. 




■i ■ . - 






• '^■. 


■ '',> 






r^J"^_ 






'it j^_. 




■•< / 


• ■ "' r 


■if 




t 





183 



1 

0.75 

0.5 

0.25 

-0.25 
-0.5 

-0.75 
-1 



« • 



• ! 4 



I 10 



12 



(a) t^max for /?0 




(b) Ivmaxl for $0 star-plot 



1 

0.75 

0.5 

0.25 



-0.25 

■0.5 

-0.75 

-1 



2 4 6 



yi 12 



(c) t^max for ^corn 




(d) |l»max| for $corn Stax-plot 



1 








0.75 


• 


0.5 


• 






e.2s 


* • 






0.25 


J , 1 . 6 8 

« 




10 1*2 
• 

• 


-0.5 








0.75 




-1 


(e) Vmax for /9,oa 








(f) It'maxI for 0soy StjlT-plot 



Figure 5.10: Corn data; Scheme 3; Umax's for individual BLUEs 



t- V 1. ■- 



184 



0.8 



0.6 



0.4 



0.2 




Figure 5.11: Corn data; Scheme 3; influence on predicted means basic curvatures 
and maximum curvatures 



CHAPTER 6 
MULTI-STAGE ESTIMATIONS AND FUTURE RESEARCH 

In this chapter, several areas of future work are outlined, most of which stem 

from the generalization of local assessment to estimating functions in Chapter 4. 

First, some preliminary results are given for extending the technique to multi-stage 

estimations. Then, a synopsis of the dissertation is presented along with some 

additional areas of future research. 

6.1 Local Assessment in Multi-stage Estimations 
Sometimes inference takes place in more than one formal stage. One example 
is linear mixed models when variance components are unknown. Restricted/residual 
maximum likelihood (REML) (Patterson and Thompson 1971; Robinson 1987; 
Searle et al. 1992) can be used to estimate the variance parameters. Then the mixed 
model equations are used to estimate /3 and predict m, with the estimated variance 
parameters treated as known. Local influence techniques can be used to assess the 
effects of perturbations on functions of the two sets of parameter estimates using 
results from Chapter 4. 

First, let Oi denote the solution of pi estimating equations hi{0i;y) = 0. 
Further, let 0^ be the solution of P2 estimating equations h2{02,y,Oi) = for 
fixed ^1. Perturbation leads to two sets of perturbed equations, /ii,w(^i;y) = 
and h2,ui{02',y,0i,u) = 0, with solutions ^i,,^ and 02,w respectively. The effects 
of perturbation are measured by m, which depends on u; explicitly and via 0i^ 
and 02,ui- The algebraic expressions and computational formulas for first- and 
second-order influence given in Chapter 4 can be used by simply using the notation 
O' = {0[,0'2). 



185 



186 



Because /ii,a;(^i;y) does not depend on 02,^, there are some simplifications 
in the computational formulas. For instance, using results on partitioned matrices 
(Searle 1982), 



h' = 



T -1 



/ill 

-1 



h 



11 







7l22 h2lhn /I22 



(6.1) 



Partitioning A leads to a simplification in J: 



d(jj' 



-h 



Ai 
A2 



(6.2) 



-/iiiAi 
h22 \h2\h11 Ai - A2J 



6.2 Synopsis and Additional Future Research 
The local influence approach to diagnostics is largely unknown, owing in part 
to the lack of introductory resources on the technique. It is hoped that the extended 
and detailed discussion in the first two chapters of this thesis have shed some light 
on this powerful and practical methodology. The diagnostics derived for perturbing 
the entire X matrix in Chapter 3 give previously unavailable detail about how 
collinearity and overfitting can aff"ect estimation. The results should encourage 
researchers to utilize perturbations other than just case-weights and place more 
emphasis on finding closed-forms for the resulting diagnostics. Because analysts are 
unaccustomed to using directional vectors, having interpretable formulas for special 
cases will help them understand and appreciate the results. 



187 

The foundation of the local influence methodology is a Taylor series 
approximation to a high-dimensional combined with some familiar geometry 
concepts: slopes and accelerations. Using these ideas, the generalization to a class of 
influence measures for optimal estimating functions in Chapter 4 is straightforward. 
This extention of the methodology provides many avenues for future research; 
quasi-likelihood and mixed model estimation are only the beginning of the areas of 
application. 

As for other areas of research, the idea of perturbations is an excellent 
concept for linking influence diagnostics, sensitivity analysis, robust inference, 
and aspects of optimal design. Robust estimation techniques are those that yield 
inferences that are largely insensitive to perturbations. Consequently, minor aspects 
of the data, model, and inference technique do not overly influence scientific 
conclusions. If a perturbation and influence measure are appropriate for several 
methods of inference, the maximum local influence could be used to compare the 
techniques' robustness. In fact, mimimizing the maximum local influence could even 
serve as a criterion for robustness (minimax li estimation). Optimal sampling and 
experimental designs, which are efficient under assumed conditions, should also be 
designed so that they are insensitive to perturbation. Local assessment could be 
used to assess how sampling and experimental designs can ameliorate or compound 
influential phenomenon. 



; » 






APPENDIX A 
MODEL FITTING INFORMATION FOR SCHOOL DATA 



NOTE: No intercept in model. R-squeire is redefined. 
Dependent Variable: VSCORE 

Analysis of Viuriance 











Sum of 


Mean 








Source 




DF 




Squares 


Square 




F Value 


Prob>F 


Model 




5 




17.21982 


3.44396 




29.019 


0.0001 


Error 




15 




1.78018 


0.11868 








U Total 




20 




19.00000 










Root 


HSE 







34450 R-square 





9063 




Dep 


Mean 


- 





00000 Adj 


R-sq 





8751 




C.V. 




-6.7455S8E16 











Parameter Estimates 







Peurameter 


Standard 


T for HO: 




Variable 


DF 


Estimate 


Error 


Parameter=0 


Prob > |T| 


SALARY 




-0.139993 


0.09301762 


-1.505 


0.1531 


HCOLR 




0.194122 


0.22907761 


0.847 


0.4101 


SES 




0.919607 


0.14859782 


6.189 


0.0001 


TSCORE 




0.250736 


0.09464627 


2.649 


0.0182 


HONED 




-0.203697 
Variimce 


0.22031344 


-0.925 


0.3698 


Variable 


DF 


Inflation 








SALARY 




1.38519879 








WCOLR 




8.40130895 








SES 




3.53513942 








TSCORE 




1.43413046 








MOMED 




7.77076288 









Collinearity Diagnostics 



Number 







Condition 


Var Prop 


Var Prop 


Var Prop 


Var Prop 


Var Prop 


r 


Eigenvalue 


Index 


SALARY 


WCOLR 


SES 


TSCORE 


MOMED 


1 


2.83681 


1.00000 


0.0134 


0.0129 


0.0296 


0.0071 


0.0142 


2 


1.39508 


1.42598 


0.2192 


. 0040 


0.0020 


0.2456 


0.0026 


3 


0.49664 


2.38997 


0.7601 


0.0006 


0.0064 


0.6485 


0.0004 


4 


0.20253 


3.74262 


0.0011 


0.0655 


0.9471 


0.0418 


0.1258 


5 


0.06894 


6.41490 


0.0062 


0.9170 


0.0149 


0.0570 


0.8570 



188 



189 



Obs 



Dep Var 
VSCORE 



Predict 
Value 



Std Err Std Err Student 

Predict Residual Residual Residual 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 



0.3314 

-1.4737 

0.2454 

0.9657 

0.3468 

-0.2033 

1 . 1548 

-0.2892 

1.0190 

0.3640 

-2.0255 

0.0202 

-0.0314 

-0.3408 

-2.1287 

0.7938 

-0.5643 

-0.5815 

1.3783 

1.0190 



0.2714 

-1.4135 

0.9244 

1.0471 

0.2126 

-0.1885 

1.0312 

-0.2146 

0.9116 

0.3278 

-1.6459 

-0.2800 

. 1489 

-0.2814 

-1.8229 

0.5707 

-0.3168 

-1.4413 

1 . 1853 

0.9738 



0.227 
0.228 
0.099 
0.120 
0.123 
0.231 
0.150 
0.083 
0.167 
0.260 
0.169 
0.205 
0.195 
0.084 
0.188 
0.113 
0.169 
0.181 
0.167 
0.143 



0.0600 
-0.0602 
-0.6790 
-0.0814 

. 1342 
-0.0147 

0.1236 
-0.0747 

0.1073 

. 0362 
-0.3796 

. 3002 
-0.1803 
-0.0594 
-0.3058 

0.2230 
-0.2475 

0.8598 

0.1930 

. 0452 



0.260 
0.259 
0.330 
0.323 
0.322 
0.256 
0.310 
0.334 
0.301 
0.226 
0.300 
0.277 
0.284 
0.334 
0.289 
0.326 
0.300 
0.293 
0.301 
0.313 



0.231 
-0.233 
-2.058 
-0 . 252 

0.417 
-0.058 

0.398 
-0.223 

0.356 

0.160 
-1.265 

1.083 
-0.634 
-0.178 
-1.058 

0.685 
-0.825 

2.932 

0.640 

0.144 



Obs 



-2-1-0 1 2 



Cook's 



D Rstudent 



Hat Diag 
H 



Cov 
Ratio 



Dffits 



1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 



0.008 
0.008 
0.077 
0.002 
0.005 
0.001 
0.007 
0.001 
0.008 
0.007 
0.102 
0.128 
0.038 
0.000 
0.094 
0.011 
0.043 
0.654 
0.025 
0.001 



0.2236 
-0.2251 
-2 . 3475 
-0.2441 

0.4055 
-0.0557 

0.3868 
-0.2160 

0.3457 

0.1545 
-1.2931 

1.0900 
-0.6212 
-0.1720 
-1.0629 

0.6725 
-0.8152 

4.3370 

0.6272 

0.1394 



0.4325 
0.4362 
0.0831 
0.1212 
0.1281 
0.4499 
0.1891 
0.0575 
0.2354 
0.5680 
0.2414 
0.3527 
0.3193 
0.0588 
0.2964 
0.1069 
0.2406 
0.2756 
. 2346 
0.1727 



2.4440 
2.4596 
0.2929 
1.5729 
1.5276 
2.5639 
1.6511 
1.4734 
1.7697 
3.2410 
1.0588 
1.4516 
1.8106 
1.4844 
1.3615 
1.3486 
1.4745 
0.0276 
1.6059 
1.6948 



0.1952 
-0.1980 
-0.7067 
-0.0906 

0.1555 
-0.0504 

0.1868 
-0.0534 

0.1918 

0.1771 
-0.7295 

0.8046 
-0.4254 
-0.0430 
-0.6899 

0.2326 
-0.4589 

2 . 6748 

0.3472 

0.0637 



190 





SALARY 


WCOLR 


SES 


TSCORE 


HOMED 


lbs 


Dfbetas 


Dfbetas 


Dfbetas 


Dfbetas 


Dfbetas 


1 


0.1627 


-0.0882 


0.0679 


-0.0276 


0.0339 


2 


-0.0660 


-0.1443 


0.0717 


0.0126 


. 1444 


3 


0.0734 


-0.0199 


0.0654 


-0.2036 


-0.2487 


4 


-0.0033 


0.0428 


-0.0220 


-0.0051 


-0.0563 


6 


0.0769 


-0.1122 


0.1070 


-0.0453 


. 0394 


6 


. 0036 


0.0155 


-0.0154 


0.0418 


-0.0105 


7 


-0.0748 


0.1413 


-0.0036 


0.0380 


-0.0973 


8 


0.0312 


-0.0014 


-0.0220 


-0.0142 


0.0250 


9 


0.0130 


0.1498 


-0.0105 


. 1046 


-0.1325 


10 


-0.1007 


-0.0145 


0.0400 


0.1489 


-0.0279 


11 


0.2465 


0.0798 


0.3961 


0.1611 


-0.2056 


12 


0.0446 


0.2661 


0.4139 


-0.2827 


-0.6298 


13 


0.1588 


0.2764 


-0.3238 


0.0694 


-0.0699 


14 


0.0169 


0.0145 


0.0204 


-0.0276 


-0.0236 


16 


-0.0457 


-0.0462 


0.6017 


-0.1181 


-0.1775 


16 


0.1471 


. 0342 


-0.0379 


-0.0840 


0.0415 


17 


-0.4650 


. 0429 


0.0595 


0.2269 


-0.0573 


18 


-0.2034 


-1.1197 


-1.8118 


0.0013 


1.9401 


19 


-0.1005 


. 0353 


-0.1001 


0.0803 


0.1385 


20 


-0.0336 


0.0341 


-0.0005 


0.0048 


-0.0165 



^,■" 



■-'^1* s 



APPENDIX B 
MODEL FITTING INFORMATION FOR CORN CROP DATA 

The MIXED Procedure 
Class Level Information 
Class Levels Values 

CNTYNUH 12 1 2 3 4 5 6 7 8 9 10 11 12 

REHL Estimation Iteration History 

Iteration Evaluations Objective Criterion 

1 243.72248590 

1 2 242.59476305 0.00000000 

Convergence criteria met. 

Covariance Parameter Estimates (REML) 

Cov Parm Estimate Std Error Z Pr > |Z| 

CNTYNUM 63.28953258 75.07490859 0.84 0.3992 
Residual 297.72802207 84.64939011 3.62 0.0004 

Asymptotic Covarlimce Matrix of Estimates 

Cov Parm Row COVPl C0VP2 

CNTYNUM 1 5636.2419002 -2123.731245 
Residual 2 -2123.731245 7148.5993683 



Model Fitting Information for HECTCORN 
Description * ^ Value 



Observations 37 

Res Log Likelihood -152.541 

Akaike's Information Criterion -154.541 

Schwarz's Bayesian Criterion -156.068 

-2 Res Log Likelihood 305.0826 



Solution for Fixed Effects 
Effect Estimate 



INTERCEPT 120.74028537 
PIX.CORN 26.76987997 
PII.SOT -2.04732793 



Std Error 


DF 


t 


Pr > Itl 


3.78206463 


11 


31.92 


0.0001 


4.56957580 


23 


5.64 


0.0001 


4.56652013 


23 


-0.45 


0.6574 



191 



192 



Solution for Random Effects 



Effect 


CNTYNUM Estimate 


SE Pred 


DF 


t 


Pr > Itl 


CNXyNUM 


1 


2.18380161 


7 . 35943306 


23 


0.30 


0.7693 


CNTYNUM 


2 


1.47455727 


7.30784520 


23 


0.20 


0.8419 


CNTYNUM 


3 


-4.72910256 


7.26641831 


23 


-0.65 


0.5216 


CNTYNUM 


4 


-2.76387525 


6.88938048 


23 


-0.40 


0.6920 


CNTYNUM 


5 


8.36866398 


6.42653277 


23 


1.30 


. 2057 


CNTYNUM 


6 


4.27362586 


6.52255117 


23 


0.66 


0.5188 


CNTYNUM 


7 


-2.70476376 


6.50319394 


23 


-0.42 


0.6813 


CNTYNUM 


8 


1.15642520 


6.47806611 


23 


0.18 


. 8599 


CNTYNUM 


9 


5.02562759 


6.21203551 


23 


0.81 


0.4268 


CNTYNUM 


10 


-2.88274430 


5.92509787 


23 


-0.49 


0.6312 


CNTYNUM 


11 


-8.65058657 


5.88291972 


23 


-1.47 


. 1550 


CNTYNUM 


12 


-0.75162907 


5.69287011 


23 


-0.13 


0.8961 



Tests of Fixed Effects 
Source NDF DDF Type III F Pr > F 



PIX.CORM 


1 


23 


31.80 0.0001 


PII.SOY 


1 


23 


0.20 0.6574 






Predicted Values 



193 



HECTCORN 


Predicted 


SE Pred 




L95 




n95 


Residual 


165.7600 


155.4869 


9.7838 


135 


.2477 


175 


.7262 


10.2731 


96.3200 


89.3834 


8.8869 


70 


.9994 


107 


.7674 


6.9366 


76.0800 


98.3267 


8.1004 


81 


.5698 


115 


.0837 


-22.2467 


185.3500 


170.5417 


8.5361 


152 


.8834 


188 


.2000 


14.8083 


116.4300 


144.2402 


7.5387 


128 


.6451 


159 


.8353 


-27.8102 


162.0800 


154.4196 


7.0506 


139 


.8343 


169 


.0049 


7 . 6604 


152.0400 


125.5822 


6.7333 


111, 


.6532 


139 


.5112 


26.4578 


161.7500 


156.5001 


7.1277 


141, 


.7553 


171 


.2449 


5 . 2499 


92.8800 


91.0834 


7.9300 


74 


.6789 


107 


.4879 


1.7966 


149.9400 


131.2890 


7 . 2587 


116 


.2733 


146, 


.3048 


18.6510 


64 . 7500 


65.0934 


8.7890 


46 


.9121 


83 


.2748 


-0.3434 


127.0700 


141.4215 


7.1954 


126, 


.5367 


156, 


.3062 


-14.3515 


133.5500 


118.8645 


7.1089 


104, 


.1587 


133, 


.5704 


14.6855 


77.7000 


90.7578 


7.6708 


74 


.8896 


106 


.6260 


-13.0578 


206.3900 


184.9299 


9.1302 


166, 


,0426 


203, 


.8172 


21.4601 


108.3300 


118.7686 


6.7585 


104, 


.7876 


132 


.7496 


-10.4386 


118.1700 


123.7514 


7.6313 


107, 


.9648 


139, 


.5380 


-5.5814 


99.9600 


106.1059 


7.3867 


90, 


.8254 


121, 


.3864 


-6.1459 


140.4300 


123.6154 


6.2159 


110, 


,7567 


136, 


.4740 


16.8146 


98.9500 


91.7140 


8.0456 


75, 


.0704 


108, 


.3575 


7.2360 


131.0400 


125.3031 


7 . 5450 


109, 


,6951 


140, 


.9111 


5.7369 


114.1200 


123.9749 


5.9584 


111, 


,6490 


136, 


.3009 


-9.8549 


100.6000 


97.0015 


6.3382 


83 


.8900 


110, 


.1130 


3.5985 


127.8800 


139.1748 


6.4405 


125, 


,8516 


152, 


.4980 


-11.2948 


116.9000 


107.4351 


5 . 9034 


95, 


.2230 


119, 


.6473 


9 . 4649 


87.4100 


92.8847 


6.9265 


78, 


.5562 


107, 


.2132 


-5.4747 


93 . 4800 


85.2028 


9.2386 


66 


.0913 


104, 


.3142 


8 . 2772 


121.0000 


138.6914 


6.9531 


124, 


.3079 


153, 


.0749 


-17.6914 


109.9100 


127.4057 


8.0247 


110, 


.8053 


144, 


.0061 


-17.4957 


122.6600 


129.0737 


6.0804 


116, 


.4955 


141, 


.6519 


-6.4137 


104.2100 


111.5808 


6.1297 


98 


.9005 


124, 


.2610 


-7.3708 


88.5900 


89.8508 


6.3707 


76 


.6720 


103 


.0297 


-1.2608 


88.5900 


139.1245 


8.2468 


122, 


.0648 


156 


.1843 


-50.5345 


165.3500 


142.4030 


6.2021 


129, 


.5729 


155, 


.2330 


22.9470 


104.0000 


106.1154 


5.7054 


94 


.3129 


117 


.9179 


-2.1154 


88.6300 


75.2417 


8.4331 


57 


.7964 


92 


.6869 


13.3883 


163.7000 


139.6604 


6.3394 


126 


.5464 


152 


.7744 


14.0396 



REFERENCES 

Atkinson, A. C. (1986), "Comments on 'Influential Observations, High Leverage 
Points, and Outliers in Linear Regression'," Statistical Science, 1, 397-402. 

Barlow, W. E. (1997), "Global Measures of Local Influence for Proportional 
Hazards Regression Models," Biometrics, 53, 1157-1162. 

Battese, G. E., Barter, R. M., and Fuller, W. A. (1988), "An Error-components 

Model for Prediction of County Crop Areas Using Survey and Satellite Data," 
Journal of the American Statistical Association, 83, 28-36. 

Beckman, R. J., Nachtsheim, C. J., and Cook, R. D. (1987), "Diagnostics for 
Mixed-model Analysis of Variance," Technometrics, 29, 413-426. 

Belsley, D. A., Kuh, E., and Welsch, R. E. (1980), Regression Diagnostics: 

Identifying Influential Data and Sources of Collinearity, New York: Wiley. 

Bhapkar, V. P. (1972), "On a Measure of Efficiency of An Estimating Equation," 
Sankhyd, Series A, Indian Journal of Statistics, 34, 467-472. 

Billor, N. and Loynes, R. M. (1993), "Local Influence: A New Approach," 

Communications in Statistics, Part A - Theory and Methods, 22, 1595-1611. 

Bj0rnstad, J. F. (1990), "Predictive Likelihood: A Review (with discussion)," 
Statistical Science, 5, 242-265. 

Booth, J. G. and Hobert, J. P. (1998), "Standard Errors of Prediction in 

Generalized Linear Mixed Models," Journal of the American Statistical 
Association, 93, 262-272. 

Box, G. E. P. (1982), "Choice of Response Surface Design and Alphabetic 
Optimality," Utilitas Mathematica, 21, 11-55. 

Brown, G. C. (1997), "Local Influence Regression Diagnostics for Gamma Data," 
Communications in Statistics, Part A - Theory and Methods, 26, 3051-3067. 

Butler, R. W. (1986), "Predictive Likelihood Inference With Applications (with 
discussion)," Journal of the Royal Statistical Society, Series B, 48, 1-38. 

Cadigan, N. G. (1995), "Local Influence in Structural Equation Models," Structural 
Equation Modeling, 2, 13-30. 

Cadigan, N. G. and Farrell, P. J. (1999), "Expected Local Influence in the the 

Normal Linear Regression Model," Statistics & Probability Letters, 41, 25-30. 

194 



195 

Cook, R. D. (1977), "Detection of Influential Observation in Linear Regression," 
Technometrics, 19, 15-18. 

Cook, R. D. (1979), "Influential Observations in Linear Regression," Journal of the 
American Statistical Association, 74, 169-174. 

Cook, R. D. (1986), "Assessment of Local Influence (with discussion)," Journal of 
the Royal Statistical Society, Series B, 48, 133-169. 

Cook, R. D. and Weisberg, S. (1991), "Added Variable Plots in Linear Regression," 
In Stahel, W. and Weisberg, S., editors. Directions in Robust Statistics and 
Diagnostics. Part I, pages 47-60, New York: Springer- Verlag. 

Cox, D. R. and Reid, N. (1987), "Parameter Orthogonality and Approximate 
Conditional Inference (with discussion)," Journal of the Royal Statistical 
Society, Series B, 49, 1-39. 

Crowder, M. (1986), "On Consistency and Inconsistency of Estimating Equations," 
Econometric Theory, 2, 305-330. 

Crowder, M. (1987), "On Linear and Quadratic Estimating Functions," 
Biometrika, 74, 591-597. , 

Desmond, A. F. (1997a), "Optimal Estimating Functions, Quasi-likelihood and 
Statistical Modelling (with discussion)," Journal of Statistical Planning and 
Inference, 60, 77-121. 

Desmond, A. F. (1997b), "Reply to Comments on 'Optimal Estimating Functions, 
Quasi-likelihood and Statistical Modelling'," Journal of Statistical Planning 
and Inference, 60, 116-121. 

Escobar, L. A. and Meeker, William Q., J. (1992), "Assessing Influence in 
Regression Analysis With Censored Data," Biometrics, 48, 507-528. 

Farebrother, R. W. (1992), "Relative Local Influence and the Condition Number," 
Communications in Statistics, Part B - Simulation and Computation, 21, 
707-710. 

Firth, D. (1987), "On the Efficiency of Quasi-likelihood Estimation," Biometrika, 
74, 233-245. 

Fuller, W. A. (1987), Measurement Error Models, New York: Wiley. 

Fung, W. K. and Kwan, C. W. (1997), "A Note on Local Influence Based on Normal 
Curvature," Journal of the Royal Statistical Society, Series B, 59, 839-843. 

Fung, W. K. and Tang, M. K. (1997), "Assessment of Local Influence in 

Multivariate Regression Analysis," Communications in Statistics, Part A - 
Theory and Methods, 26, 821-837. 



Galea, M., Paula, G. A., and Bolfarine, H. (1997), "Local Influence in Elliptical 
Linear Regression Models," The Statistician, 46, 71-79. 

Geisser, S. (1991), "Diagnostics, Divergences and Perturbation Analysis," In 
Stahel, W. and Weisberg, S., editors, Directions in Robust Statistics and 
Diagnostics. Part I, pages 89-100, New York: Springer- Verlag. 

Geisser, S. (1993), Predictive Inference. An Introduction, New York: Chapman & 
Hall. 

Godambe, V. P. (1960), "An Optimum Property of Regular Maximum Likelihood 
Estimation," The Annals of Mathematical Statistics, 31, 1208-1212. 

Godambe, V. P. (1976), "Conditional Likelihood and Unconditional Optimum 
Estimating Equations," Biometrika, 63, 277-284. 

Godambe, V. P. (1991), "Orthogonality of Estimating Functions and Nuisance 
Parameters," Biometrika, 78, 143-151. 

Godambe, V. P. and Heyde, C. C. (1987), "Quasi-likelihood and Optimal 
Estimation," International Statistical Review, 55, 231-244. 

Godambe, V. P. and Thompson, M. E. (1989), "An Extension of Quasi-likelihood 
Estimation (with discussion)," Journal of Statistical Planning and Inference, 
. 22, 137-172. 

Henderson, C. R., Kempthorne, O., Searle, S. R., and VonKrosig, C. N. (1959), 
"Estimation of Environmental and Genetic Trends from Records Subject to 
Culling," Biometrics, 15, 192-218. 

Hilden-Minton, J. A. (1995), Multilevel Diagnostics for Mixed and Hierarchical 
Linear Models, PhD thesis, The University of California, Los Angeles. 

Huynh, H. (1982), "A Comparison of Four Approaches to Robust Regression," 
Psychological Bulletin, 92, 505-512. 

Johnson, A. R. and Wichern, D. W. (1992), Applied Multivariate Statistical 
Analysis (3rd ed.), Englewood Cliffs, NJ: Prentice-Hall. 

Jung, K.-M., Kim, M. G., and Kim, B. C. (1997), "Local Influence in Maximum 
Likelihood Factor Analysis," Computational Statistics and Data Analysis, 24, 
483-491. 

Kass, R. E., Tierney, L., and Kadane, J. B. (1989), "Approximate Methods for 
Assessing Influence and Sensitivity in Bayesian Analysis," Biometrika, 76, 
663-674. 

Kim, C. (1996a), "Local Influence and Replacement Measure," Communications in 
Statistics, Part A - Theory and Methods, 25, 49-61. 



197 

Kim, M. G. (1995), "Local Influence in Multivariate Regression," Communications 
in Statistics, Part A - Theory and Methods, 24, 1271-1278. 

Kim, M. G. (1996b), "Local Influence in Box-Cox Transformation for Multivariate 
Regression Data," Communications in Statistics, Part A - Theory and 
Methods, 25, 2955-2969. 

Kim, M. G. (1996c), "Local Influence in Multivariate Normal Data," Journal of 
Applied Statistics, 23, 535 541. 

Kim, M. G. (1998), "Local Influence on a Test of Linear Hypothesis in Multiple 
Regression Model," Journal of Applied Statistics, 25, 145-152. 

Kim, Y.-I. (1991), "Local Influence Approach Diagnostics for Optimal 

Experimental Design," Korean Journal of Applied Statistics, 4, 195-207. 

Lavine, M. (1992), "Local Predictive Influence in Bayesian Linear Models With - 
Conjugate Priors," Communications in Statistics, Part B - Simulation and 
Computation, 21, 269-283. 

Lawrance, A. J. (1988), "Regression Transformation Diagnostics Using Local 

Influence," Journal of the American Statistical Association, 83, 1067-1072. 

Lawrance, A. J. (1991), "Local and Deletion Influence," In Stahel, W. and 

Weisberg, S., editors. Directions in Robust Statistics and Diagnostics. Part I, 
pages 141-157, New York: Springer- Verlag. 

Lee, A. H. and Zhao, Y. (1996), "Assessing Local Influence in Measurement Error 
Models," Biometrical Journal. Journal of Mathematical Methods in 
Biosciences, 38, 829-841. 

Lee, A. H. and Zhao, Y. (1997), "Assessing Influence on the Goodness-of-link in 
Generalized Linear Models," Statistics & Probability Letters, 31, 351-358. 

Lee, Y. and Nelder, J. A. (1996), "Hierarchical Generalized Linear Models (with 
discussion)," Journal of the Royal Statistical Society, Series B, 58, G19'678. 

Lejeune, M. and Faulkenberry, G. D. (1982), "A Simple Predictive Density 

Function (with discussion)," Journal of the American Statistical Association, 
77, 654-659. 

LesafFre, E., Asefa, M., and Verbeke, G. (1999), "Assessing the Goodness-of-Fit of 
the Laird and Ware Model- an Example: The Jimma Infant Survival 
Differential Longitudinal Study," Statistics in Medicine, 18, 835-854. 

LesaflFre, E. and Verbeke, G. (1998), "Local Influence in Linear Mixed Models," 
Biometrics, 54, 570-582. 



198 

Levy, M. S. and Perng, S. K. (1984), "A Maximum Likelihood Prediction Function 
for the Linear Model With Consistency Results," Communications in 
Statistics, Part A - Theory and Methods, 13, 1257-1273. 

Liang, K.-Y. and Zeger, S. L. (1986), "Longitudinal Data Analysis Using 
Generalized Linear Models," Biometrika, 73, 13-22. 

Lindsay, B. (1982), "Conditional Score Functions: Some Optimality Results," 
Biometrika, 69, 503-512. 

Littell, R. C, Milliken, G. A., Stroup, W. W., and Wolfinger, R. D. (1996), SAS 
System for Mixed Models, Gary, NC: SAS Institute. 

Loynes, R. M. (1986), "Comments on "Assessment of Local Influence"," Journal of 
the Royal Statistical Society, Series B, 48, 156-157. 

Lu, J., Ko, D., and Chang, T. (1997), "The Standardized Influence Matrix and Its 
Applications," Journal of the American Statistical Association, 92, 1572-1580. 

Mathiasen, P. E. (1979), "Prediction Functions," Scandinavian Journal of 
Statistics, 6, 1-21. 

McCullagh, P. (1983), "Quasi-likelihood Functions," The Annals of Statistics, 11, 
59-67. 

McCullagh, P. and Nelder, J. A. (1989), Generalized Linear Models (2nd Ed.), 
London: Chapman & Hall. 

McCulloch, C. E. (1997), "Maximum Likelihood Algorithms for Generalized Linear 
Mixed Models," Journal of the American Statistical Association, 92, 162-170. 

McCulloch, R. E. (1989), "Local Model Influence," Journal of the American 
Statistical Association, 84, 473-478. 

Morton, R. (1981), "Efficiency of Estimating Equations and the Use of Pivots," 
Biometrika, 68, 227-233. 

Pan, J.-X., Fang, K.-T., and Liski, E. P. (1996), "Bayesian Local Influence for the 
Growth Curve Model With Rao's Simple Covariance Structure," Journal of 
Multivariate Analysis, 58, 55-81. 

Pan, J.-X., Fang, K.-T., and von Rosen, D. (1997), "Local Influence Assessment in 
the Growth Curve Model With Unstructured Covariance," Journal of 
Statistical Planning and Inference, 62, 263-278. 

Patterson, H. D. and Thompson, R. (1971), "Recovery of Inter-block Information 
When Block Sizes Are Unequal," Biometrika, 58, 545-554. 

Pettitt, A. N. and Daud, I. B. (1989), "Case-weighted Measures of Influence for 
Proportional Hazards Regression," Applied Statistics, 38, 51-67. 



199 

Poon, W.-Y. and Poon, Y. S. (1999), "Conformal Normal Curvature and 

Assessment of Local Influence," Journal of the Royal Statistical Society, 
Series 5, 61, 51-61. 

Prentice, R. L. (1988), "Correlated Binary Regression With Covariates Specific to 
Each Binary Observation," Biometrics, 44, 1033-1048. 

Rancel, M. M. S. and Sierra, M. A. G. (1999), "Measures and Procedures for the 
Identification of Locally Influential Observations in Linear Regression," 
Communications in Statistics, Part A - Theory and Methods, 28, 343-366. 

Robinson, D. L. (1987), "Estimation and Use of Variance Components," The 
Statistician, 36, 3-14. 

Schall, R. and Dunne, T. T. (1992), "A Note on the Relationship Between 
Parameter Collinearity and Local Influence," Biometrika, 79, 399-404. 

Schall, R. and Gonin, R. (1991), "Diagnostics for Nonlinear Lp-norm Estimation," 
Computational Statistics and Data Analysis, 11, 189-198. 

Schwarzmann, B. (1991), "A Connection Between Local-influence Analysis and 
Residual Diagnostics," Technometrics, 33, 103-104. 

Searle, S. R. (1971), Linear Models, New York: Wiley. 

Searle, S. R. (1982), Matrix Algebra Useful for Statistics, New York: Wiley. 

Searle, S. R., Casella, G., and McCuUoch, C. E. (1992), Variance Components, New 
York: Wiley. 

Shi, L. (1997), "Local Influence in Principal Components Analysis," Biometrika, 
84,175-186. 

St. Laurent, R. T. and Cook, R. D. (1993), "Leverage, Local Influence and 
Curvature in Nonlinear Regression," Biometrika, 80, 99-106. 

Swokowski, E. W. (1988), Calculus with Analytical Geometry (2nd ed.), Boston: 
PWS-Kent. 

Tang, M. K. and Fung, W. K. (1996), "First Order Local Influence of Test 

Statistics in Multivariate Regression," Sankhyd, Series B, Indian Journal of 
Statistics, 58, 323-337. 

Tawn, J. (1987), "Comments on 'Parameter Orthogonality and Approximate 

Conditional Inference'," Journal of the Royal Statistical Society, Series B, 49, 
33-34. - 

Thomas, W. (1990), "Influence on Confidence Regions for Regression Coefficients in 
Generalized Linear Models," Journal of the American Statistical Association, 
85, 393-397. 



200 

Thomas, W. (1991), "Influence Diagnostics for the Cross- validated Smoothing 
Parameter in Spline Smoothing," Journal of the American Statistical 
Association, 86, 693-698. 

Thomas, W. and Cook, R. D. (1989), "Assessing Influence on Regression 
Coefficients in Generalized Linear Models," Biometrika, 76, 741-749. 

Thomas, W. and Cook, R. D. (1990), "Assessing Influence on Predictions From 
Generalized Linear Models," Technometrics, 32, 59-65. 

Tsai, C.-L. and Wu, X. (1992), "Assessing Local Influence in Linear Regression 

Models With First-order Autoregressive Or Heteroscedastic Error Structure," 
Statistics & Probability Letters, 14, 247-252. 

Vos, P. W. (1991), "A Geometric Approach to Detecting Influential Cases," The 
Annals of Statistics, 19, 1570-1581. 

Wang, S.-J. and Lee, S.-Y. (1996), "Sensitivity Analysis of Structural Equation 

Models With Equality Functional Constraints," Computational Statistics and 
Data Analysis, 23, 239-256. 

Wang, X., Ren, S., and Shi, L. (1996), "Local Influence in Discriminant Analysis," 
Journal of Systems Science and Mathematical Science, 8, 27-36. 

Wedderburn, R. W. M. (1974), "Quasi-likelihood Functions, Generalized Linear 
Models, and the Gauss-Newton Method," Biometrika, 61, 439-447. 

Weiss, R. (1996), "An Approach to Bayesian Sensitivity Analysis," Journal of the 
Royal Statistical Society, Series B, 58, 739 750. 

Weissfeld, L. A. (1990), "Influence Diagnostics for the Proportional Hazards 
Model," Statistics & Probability Letters, 10, 411-417. 

Weissfeld, L. A. and Schneider, H. (1990), "Influence Diagnostics for the WeibuU 
Model Fit to Censored Data," Statistics & Probability Letters, 9, 67-73. 

Wu, X. and Luo, Z. (1993a), "Residual Sum of Squares and Multiple Potential, 
Diagnostics By a Second Order Local Approach," Statistics & Probability 
Letters, 16, 289-296. 

Wu, X. and Luo, Z. (1993b), "Residual Sum of Squares and Multiple Potential, 
Diagnostics By a Second Order Local Approach," Statistics & Probability 
Letters, 16, 289-296. 

Wu, X. and Luo, Z. (1993c), "Second-Order Approach to Local Influence," Journal 
of the Royal Statistical Society, Series B, 55, 929-936. 

Wu, X. and Wan, F. (1994), "A Perturbation Scheme for Nonlinear Models," 
Statistics & Probability Letters, 20, 197-202. 



aoi 



Zeger, S. L. and Liang, K.-Y. (1986), "Longitudinal Data Analysis for Discrete and 
Continuous Outcomes," Biometrics, 42, 121-130. 

Zhao, Y. and Lee, A. H. (1995), "Assessment of Influence in Non-linear 

Measurement Error Models," Journal of Applied Statistics, 22, 215-225. 



BIOGRAPHICAL SKETCH 
Glen Hartless was born in Summit, New Jersey in 1971. As he grew up, Glen 
always had an interest in "keeping the numbers." While researching careers in 
applied mathematics in the tenth grade, he stumbled upon the statistics profession 
and knew that it was perfect for him. (Considering the content of his dissertation, it 
is ironic that frustration with the abstractness of matrices is what led him to the 
field.) Interestingly, while he was still in high school, his father arranged for him to 
meet Daryl Pregibon, a pioneer in developing diagnostics for logistic regression. 
Perhaps this chance meeting influenced Glen more then he knows. As an 
undergraduate at Virgina Tech, he completed a thesis on nonparametric density 
estimation. Glen also studied exploratory graphical techniques for analyzing 
reliability data while at an NSF-sponsored Research Experience for Undergraduates 
(REU) program at the College of William & Mary. After receiving his B.S. in 
statistics from Virginia Tech in 1994, he attended the University of Florida in 
pursuit of his Master of Statistics and Ph.D. degrees. There, he did collaborative 
research for four years in the Department of Family, Youth & Community Sciences. 
Glen also met and married his wife, Christine, in Gainesville. 



202 



I certify that I have read this study and that in my opinion it conforms to 
acceptable standards of scholarly presentation and is fully adequate, in scope and 
quality, as a dissertation for the degree of Doctor of Philosophy. 



^C<^'A ^ 



<^ t^-^QTsfXyy 



Jam#s G. Booth, Chairman 
Professor of Statistics 



I certify that I have read this study and that in my opinion it conforms to 
acceptable standards of scholarly presentation^^wni is fully adequate, in scope and 
quality, as a dissertation for the degree of Dottoj/of Phik 




Ramon C. Litteil, Cochairihan 
Professor of Statistics 

I certify that I have read this study and that in my opinion it conforms to 
acceptable standards of scholarly presentation |»Jid is^ullj'^^quate, in scope and 
quality, as a dissertation for the degree of Doc 




J^eS 
ssociate Professor of Statistics 



I certify that I have read this study and that in my opinion it conforms to 
acceptable standards of scholarly presentation and is fully adequate, in scope and 
quality, as a dissertation for the degree of Doctor of Philosophy. 




Randolph L. Carter 
Professor of Statistics 

I certify that I have read this study and that in my opinion it conforms to 
acceptable standards of scholarly presentation and is fully adequate, in scope and 
quality, as a dissertation for the degree of Doctor of Philosophy. 




Janies J. Algina 

lessor of Educatibi;^l Psychology 



This dissertation was submitted to the Graduate Faculty of the Department of 
Statistics in the College of Liberal Arts and Sciences and to the Graduate School 
and was accepted as partial fulfillment of the requirements for the degree of Doctor 
of Philosophy. 

August 2000 



Dean, Graduate School 



LD 

1780 
20_2£ 



,^I333 



V '•'.,. 






.!-,: '- «. 



UNIVERSITY OF FLORIDA 



3 1262 08555 1843